Audio synchronization and delay estimation

ABSTRACT

Systems and techniques are provided for audio synchronization and delay estimation. Audio metadata including a first discrete Fourier transform representation may be received. An audio signal may be pre-processed. A second discrete Fourier transform representation may be generated from the pre-processed audio signal. A correlation result in a discrete Fourier transform representation may be generated based on an element-wise multiplication of the first and second discrete Fourier transform representations. An inverse Fourier transform may be performed on the correlation result in a discrete Fourier transform representation to generate a correlated signal including samples that may have a position and a value. A relative delay value may be determined based on the position of a sample having a value with the greatest magnitude. Playback of a second audio signal may be adjusted based on a current delay value adjusted based on the relative delay value.

BACKGROUND

The same audio signal can be delivered to both speakers in a venue, andto electronic devices within the venue. The sound produced by thespeakers based on the audio signal may arrive at the location of anelectronic device within the venue after the audio signal arrives at theelectronic device. When the audio signal arriving at the electronicdevice is used to produce sound based on the audio signal, the soundfrom the speakers may be delayed compared to the sound produced by theelectronic device. If the electronic device is mobile, the size of thedelay between the sound from the speakers and the sound from theelectronic device may change as the distance between the electronicdevice and the speakers changes.

BRIEF SUMMARY

According to implementations of the disclosed subject matter, an audiosignal may be pre-processed at a transmitter to generate a transmitterpre-processed audio signal including samples including a value andhaving a position in the transmitter pre-processed audio signal. Thepositions of the samples of the transmitter pre-processed audio signalmay be reversed to generate a reversed audio signal. A transmitterdiscrete Fourier transform representation may be generated from thereversed audio signal. The transmitter discrete Fourier transformrepresentation may be transmitted from the transmitter to a receiver asaudio metadata.

The audio metadata including the transmitter discrete Fourier transformrepresentation may be received at a receiver. A second audio signal maybe pre-processed at the receiver. A receiver discrete Fourier transformrepresentation may be generated from the pre-processed second audiosignal. A correlation result in a discrete Fourier transformrepresentation may be generated based on an element-wise multiplicationof the transmitter discrete Fourier transform representation and thereceiver discrete Fourier transform representation. An inverse Fouriertransform may be performed on the correlation result in a discreteFourier transform representation to generate a correlated signalincluding samples, each sample of the correlated signal having aposition in the correlated signal and a value. A relative delay valuemay be determined based on the position in the correlated signal of asample comprising a value with the greatest magnitude of the values ofthe samples of the correlated signal. Playback of a third audio signalmay be adjusted by the receiver based on a current delay value adjustedbased on the relative delay value.

Systems and techniques disclosed herein may allow for audiosynchronization and delay estimation. Additional features, advantages,and embodiments of the disclosed subject matter may be set forth orapparent from consideration of the following detailed description,drawings, and claims. Moreover, it is to be understood that both theforegoing summary and the following detailed description are examplesand are intended to provide further explanation without limiting thescope of the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[⁵] The accompanying drawings, which are included to provide a furtherunderstanding of the disclosed subject matter, are incorporated in andconstitute a part of this specification. The drawings also illustrateembodiments of the disclosed subject matter and together with thedetailed description serve to explain the principles of embodiments ofthe disclosed subject matter. No attempt is made to show structuraldetails in more detail than may be necessary for a fundamentalunderstanding of the disclosed subject matter and various ways in whichit may be practiced.

FIG. 1 shows an example system suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter.

FIG. 2 shows an example system suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter.

FIG. 3A shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter.

FIG. 3B shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter.

FIG. 3C shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter.

FIG. 4 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter.

FIG. 5 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter.

FIG. 6 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter.

FIG. 7 shows a computer according to an embodiment of the disclosedsubject matter.

FIG. 8 shows a network configuration according to an embodiment of thedisclosed subject matter.

DETAILED DESCRIPTION

Audio synchronization and delay estimation may allow for thesynchronization at the location of an electronic device of soundproduced by the electronic device using a received audio signal withsound produced by speakers located some distance from the electronicdevice using another received signal from the same original source asthe audio signal received at the electronic device. The delay betweenthe audio signal arriving at the electronic device and the soundproduced by the speakers arriving at the location of the electronicdevice may be estimated. The estimated delay may be used to delayproduction of sound by the electronic device using the received audiosignal by slowing down playback of the audio signal. This may result inthe sound produced by the electronic device synchronizing with the soundarriving at the location of the electronic device from the speakers, sothat a listener at the location of the electronic device doesn't hearthe sound from the audio signal being reproduced with an echo.

For example, a person attending a concert may have a personal audiodevice. The personal audio device may receive an audio signaltransmitted wirelessly, for example, using Wi-Fi. The audio signal maybe based on source audio signals generated based on sounds or signalsfrom instruments, vocalists, or other audio sources that are part of theconcert. For example, sound from instruments may be picked up bymicrophones on or near the instruments or from pickups attached to theinstruments and turned into audio signals, vocals from vocalists may bepicked up by microphones and turned into audio signals, and other audiosources, such as, for example, synthesizers, computers, or otherelectrical or electronic devices, may directly generate audio signals.The various source audio signals may be combined into the audio signalthat may be transmitted wirelessly to the personal audio device. Theaudio signal may, for example, include a separate channel for eachsource audio signal. The audio signal may be processed, for example,mixed, equalized, and separated into channels based on the number andlocation of speakers in the venue. The processed audio signal may betransmitted, either through wires or wirelessly, to speakers placedthroughout the venue of the concert. For example, if the processed audiosignal is separated into channels, each speaker may receive the portionof the processed audio signal corresponding to the channel to bereproduced by that speakers. The speakers may use the processed audiosignal to generate sound.

The audio signal may arrive at the personal audio device wirelesslybefore the sound from the speakers arrives at the location of thepersonal audio device. If the personal audio device uses the audiosignal to produce sound from portions of the audio signal as theyarrive, the sound produced by the personal audio device may precede thearrival of equivalent sound from the speakers. For example, the personalaudio device may output sound through headphones worn by the personusing the personal audio device, who may hear sound through theheadphones before hearing the same sound as it arrives from the nearestspeakers. In some venues, production of sound by the rear speakers basedon the audio signal may be delayed in order to synchronize with soundarriving from the speakers at the front of the venue, further delayingthe sound heard by a person near the rear speakers when compared tosound produced by the personal audio device based on the received audiosignal. Production of sound by the personal audio device may be delayedto synchronize the sound produced by the personal audio device with thesound arriving at the location of the personal audio device from thenearest speakers.

An audio signal may be generated from any suitable number of sourceaudio signals generated by any suitable number of audio sources. Thesource audio signals may make up any suitable number of channels. Forexample, each source audio signal may represent one channel, which maybe mono channel, or may be a stereo channel, for example, a left orright channel generated by a stereo pickup. Copies of the source audiosignals may be sent to a transmitter and to mixing, equalizing, andamplifying devices. For example, the source audio signals may betransmitted through a wired analog or digital connection which may bephysically split using a suitable analog or digital splitter, or may betransmitted wirelessly, through a wireless broadcast using any suitablewireless protocol which may be received by multiple devices, including,for example, the transmitter and the mixer.

The source audio signals received at the mixer may be mixed, forexample, setting the relative volumes of each of the channelsrepresented by the source audio signals. The mixer may combine thesource audio signals into a mixed audio signal, which may include anysuitable number of channels. The mixed audio signal may be analog ordigital. For example, the mixer may be a digital mixer, and may covertinput analog source audio signals into digital audio signals. The mixedaudio signal may be input to the equalizer, which may adjust the volumesof the various sound frequencies in the mixed audio signal. The mixedand equalized audio signal may be input to an amplifier, which mayamplify the mixed and equalized audio signal and provide the amplifiedaudio signal to the speakers throughout the venue to be used to generatesound. The mixed and equalized audio signal may include any number ofchannels, such as, for example, two stereo channels to be sent tospeakers located to the left and right of the audio sources, or multiplechannels, for example, one channel for each speaker.

The source audio signals received at the transmitter may be processed togenerate audio metadata be sent with the audio signal to electronicdevices, such as personal audio devices, within the venue. Thetransmitter may include a computing device and a wireless transmissiondevice. For example, the transmitter may include a computer, such as alaptop, connected to a Wi-Fi router. The computing device may use anysuitable combination of hardware and software to implement varioussignal processing techniques. For example, the computing device may begeneral purpose computer running signal processing software, or may be acomputing device including signal processing hardware used inconjunction with or in place of signal processing software. The sourceaudio signals may be combined into a combined analog audio signal beforebeing input to the computing device, or may be combined by the computingdevice. The source audio signals may also be sampled separately andcombined into a multi-channel digital audio signal, which may, forexample, include digital conversions of all of the source audio signalsand may preserve channel information for the source audio signals.

The computing device of the transmitter may sample the combined analogaudio signal at a suitable sample rate, such as, for example, 48 kHz,and may generate a digital audio signal. The analog audio signal may besampled continuously as it arrives at the computing device of thetransmitter, generating a continuous digital audio signal. For example,the computing device may include an Analog-to-Digital Converter (ADC),which may sample the input analog audio signal to generate a digitalaudio signal.

The digital audio signal may be filtered and down sampled. For example,the computing device may use an anti-aliasing filter with any suitableparameters to filter portions of the digital audio signal continuouslyas they are generated from the sampling of the combined analog audiosignal. For example, the anti-aliasing filter may use a stop bandfrequency of 1500 Hz, a ripple of 1 dB, a stop band of −50 dB, and apass band of 1150 Hz. The anti-aliasing filter may be implemented usingany suitable combination of hardware and software. After being processedthrough the anti-aliasing filter, the filtered digital audio signal maybe down sampled any suitable number of times. For example, the filtereddigital audio signal may be down sampled by a factor of 16, resulting ina 3 kHz down sampled digital audio signal.

After being down sampled, the down sampled digital audio signal may bestored in an input array. The input array may be, for example, a datastructure of any suitable size. For example, the input array may be a2048 element array, and each element may store one of 2048 samples. Thesamples stored in the input array may represent any suitable length ofthe down sampled digital audio signal. For example, the samples storedin the input array may represent 682.7 ms of the down sampled digitalaudio signal. The input array may be stored on the computing device inany suitable manner, in any suitable storage hardware, includingvolatile and non-volatile storage. The sampling, filtering and downsampling of the combined analog audio signal may be continuous. Forexample, the combined analog audio signal may be sampled, filtered withthe anti-aliasing filter, and down sampled as it is received at thecomputing device of the transmitter. The results of the down samplingmay be continuously stored in the input array, for example, on afirst-in first-out basis, with new samples pushing down older samplesand the newest sample causing the oldest sample to exit the input array.The input array may be implemented as a first-in first-out queue usingany suitable data structure. The samples stored in the input array maybe the result of down sampling. For example, when the input array stores2048 values, each representing a sample, the input array may represent682.7 ms of the combined analog audio signal at a sampling rate of 3kHz, down sampled from 32768 samples representing the same 682.7 ms ofthe combined analog audio signal at a sampling rate of 48 kHz, assampled by the computing device on input of the combined analog audiosignal.

The down sampled digital audio signal, as stored in the input array, mayhave its root mean square (RMS) determined. For example, the computingdevice may calculate the value of the root mean square of the downsampled digital audio signal based on the 2048 values stored in theinput array. The value of the root mean square may be stored, forexample, in a transmission buffer, and may be part of audio metadata forthe portion of the source audio signals, for example, the 682.7 ms ofthe source audio signals, that were combined, sampled, filtered, anddown sampled to produce the down sampled digital audio signal stored inthe input array. The value of the root mean square may be determined atintervals, such as for example, at 682.7 ms when the input array isinitially filled from continuous sampling of the combined analog audiosignal, and once every 500 ms thereafter, reusing samples representing182.7 ms of the combined analog audio signal that were used in theprevious determination of the value of the root mean square.

The down sampled digital audio signal may be windowed. For example, thecomputing device may window the down sampled digital audio signal fromthe input array using a Tukey window. The Tukey window may use a numberof points matching the size of the input array. For example, the inputarray may store 2048 samples for the down sampled digital audio signal,and the Tukey window may use 2048 points. The Tukey window may use aratio of 0.2. The windowed digital audio signal may have its dataflipped. For example, the computing device may reverse the order of theindividual samples of the windowed digital audio signal through memorymapping. The reversed digital audio signal may be the windowed digitalaudio signal backwards. The down sampled digital audio signal may bewindowed, and the resulting windowed digital audio signal flipped, atintervals, such as, for example, at 682.7 ms initially, and then onceevery 500 ms, at the same time the value of the root mean square isdetermined. The reversed digital audio signal and the value of the rootmean square may both be determined from the same down sampled digitalaudio signal represented by, for example, the same elements of the inputarray sampled from the same portion of the combined analog audio signalreceived at the transmitter. In some implementations, other window typesmay be used to window the down sampled digital audio signal.

The reversed digital audio signal may be transformed to the frequencydomain using a Fourier transform. For example, the computing device mayimplement a 2048 point fast Fourier transform (FFT) on the reverseddigital audio signal to generate a discrete Fourier transform (DFT) forthe reversed digital audio signal. The DFT representation of thereversed digital audio signal may include any suitable number of complexnumbers. For example, the DFT representation generated by a 2048 pointFFT may include 2048 complex numbers. The DFT representation may be afrequency domain representation of the reversed digital audio signal.The DFT representation may be normalized by dividing each component ofeach complex number in the DFT representation by the magnitude of thereal or imaginary component with the greatest magnitude of anycomponents of any of the complex numbers in the DFT representation. TheDFT representation may be stored in the transmission buffer along withthe value of the root mean square. In some implementations, only some ofthe complex numbers of the DFT representation may be stored in thetransmission buffer. For example, for a DFT representation with 2048complex numbers, only the first 1025 complex numbers may be stored inthe transmission buffer. The 1^(st) and 1025^(th) complex numbers may beunique, but the 2^(nd) through 1024^(th) complex number and 1026^(th)through 2048^(th) complex number may be mirrored, so that the full DFTrepresentation of 2048 complex numbers may be reconstructed with onlythe first 1025 complex numbers.

The wireless transmission device may transmit the audio metadata storedin the transmission buffer. For example, the wireless transmissiondevice may use a radio signal of any suitable type, including, forexample, a Wi-Fi signal, to transmit the DFT representation, or somesubset of the complex numbers of the DFT representation, of the reverseddigital audio signal and the value of the root mean square stored in thetransmission buffer. The audio metadata may be transmitted along withmulti-channel digital audio signal generated from the same section ofthe source audio signals that was used to generate the audio metadata.For example, the multi-channel digital audio signal, including digitalconversions of all of the source audio signals and channel informationfor the source audio signals for the same 682.7 ms section of the sourceaudio signals that were combined and used to generate the DFTrepresentation and for which the value of the root mean square wasdetermined, may be transmitted along with the audio metadata. Themulti-channel digital audio signal and the audio metadata may betransmitted separately using separate wireless communication channels,bandwidth, or frequency. Alternatively, the multi-channel digital audiosignal and the audio metadata may be modulated or multiplexed togetherand transmitted using a single communication channel, bandwidth orfrequency. For example, the multi-channel digital audio signal and theaudio metadata may be encoded using a Quadrature Amplitude Modulation(QAM) technique, such as 16-bit QAM.

The transmitter may generate and transmit audio metadata at any suitablerate. For example, while the sampling, alias filtering, and downsampling of the combined analog audio signal may be continuous, theaudio metadata may be generated initially after 682.7 ms, and then onceevery 500 ms. The first transmission may not occur until 682.7 ms of thecombined analog audio signal have been sampled in order to ensure theinput array is filled with samples, after which subsequent audiometadata may be generated and transmitted every 500 ms based on a set ofsamples that includes some samples used to generate the immediatelyprevious audio metadata. For example, the last 182.7 ms of samples usedto generate the previous audio metadata may be used to generate thesubsequent audio metadata. The reused samples may remain in the inputarray to be reused, as they may not yet have been pushed out of theinput array.

The transmitted audio metadata and multi-channel digital audio signalmay be received at an electronic device. For example, a personal audiodevice used by a person in the venue may include a wirelesscommunications device, such as, for example, a Wi-Fi radio, which mayreceive wireless transmissions from the transmission device. Theelectronic device may generate sound based on the received multi-channeldigital audio signal. For example, the personal audio device may includeany suitable combination of hardware and software for driving a speakerbased on a digital audio signal. Headphones, which may be wired orwireless, may be connected to the personal audio device and may be wornby the person. The electronic device may allow for the manipulation ofthe multi-channel digital audio signal. For example, the personal audiodevice may allow the person to adjust the mixing of the channels andequalization of the frequencies in the multi-channel digital audiosignal. This may allow, for example, a person to change the relativevolumes of instruments and vocals, or emphasize or deemphasizefrequencies, in the multi-channel digital audio signal, changing thesound generated by the headphones using the multi-channel digital audiosignal. The multi-channel digital audio signal may be buffered orotherwise held in any suitable memory of the electronic device as it isreceived and before being used to generate sound. This may allowplayback of the multi-channel digital audio signal by the electronicdevice to be delayed or sped up. The amount of time it takes soundgenerated by speakers based on parts of the same section of the sourceaudio signals as the audio metadata to arrive at the location of theelectronic device after the arrival of the multi-channel digital audioconverted from the same section of the source audio signals may bedetermined based on the audio metadata. This amount of time may be adelay between the sound from the speakers and the multi-channel digitalaudio signal from the transmitter. When the sound generated by thespeakers is delayed, only part of the section of the source audiosignals may have been used by the speakers to generate sound by the timethe entire section of the source audio signals arrives at the electronicdevice as the multi-channel digital audio signal.

The electronic device may include a microphone. The microphone maygenerate an analog audio signal based on sounds in the surroundingenvironment. For example, a microphone on a personal audio device usedby a person in a venue may generate an analog audio signal based onsounds in the venue, including, for example, sounds being played overspeakers. The sounds being played over the speakers may be generatedbased on the audio signal sent to the speakers, for example, from theamplifier, and may be based on the copies of the source audio signalsthat were sent to the mixer.

The analog audio signal generated by the microphone may be sampled bythe electronic device at a suitable sample rate, such as, for example,the same sample rate used by the computing device of the transmitter,which may be 48 kHz. The analog audio signal generated by the microphonemay be sampled continuously as sound arrives at the location of theelectronic device and is converted to an analog audio signal by themicrophone. The sampling may generate a continuous digital audio signal.For example, the electronic device may include an Analog-to-DigitalConverter (ADC), which may sample the analog audio signal generated bythe microphone to generate a digital audio signal.

The digital audio signal may be filtered and down sampled. For example,the electronic device may use an anti-aliasing filter with any suitableparameters to filter the portions of the digital audio signalcontinuously as they are generated from the sampling of the analog audiosignal generated by the microphone. For example, the anti-aliasingfilter of the electronic device may use the same parameters as theanti-aliasing filter of the computing device of the transmitter,including a stop band frequency of 1500 Hz, a ripple of 1 dB, a stopband of −50 dB, and a pass band of 1150 Hz. The anti-aliasing filter maybe implemented using any suitable combination of hardware and software.After being processed through the anti-aliasing filter, the filtereddigital audio signal may be down sampled the same number of times thatthe filtered digital audio signal generated on the computing device ofthe transmitter is down sampled. For example, the filtered digital audiosignal may be down sampled by a factor of 16, resulting in a 3 kHz downsampled digital audio signal.

After being down sampled, the down sampled digital audio signal may bestored in an input array. The input array may be, for example, a datastructure of any suitable size, and may store may data than the inputarray on the computing device of the transmitter. For example, the inputarray may be a 3000 element array, and each element may store one of3000 samples. The samples stored in the input array may represent anysuitable length of the audio signal. For example, the samples stored inthe input array may represent 1000 ms of the audio signal at 3 kHz. Theinput array may be stored on the electronic device in any suitablemanner, in any suitable storage hardware, including volatile andnon-volatile storage. The sampling, filtering and down sampling of theanalog audio signal generated by the microphone may be continuous. Forexample, the analog audio signal generated by the microphone may besampled, filtered with the anti-aliasing filter, and down sampled as itis generated by the microphone of the electronic device. The results ofthe down sampling may be continuously stored in the input array, forexample, on a first-in first-out basis, with new samples pushing downolder samples and the newest sample causing the oldest sample to exitthe input array. The input array may be implemented as a first-infirst-out queue using any suitable data structure. The samples stored inthe input array may be the result of down sampling. For example, whenthe input array stores 3000 values, each representing a sample, theinput array may represent 1000 ms of the analog audio signal generatedby the microphone at a sampling rate of 3 kHz, down sampled from 48000samples representing the same 1000 ms of the analog audio signalgenerated by the microphone at sampling rate 48 kHz, as sampled by theelectronic device from the analog audio signal.

A section of the down sampled digital audio signal may be windowed. Forexample, the electronic device may window a section of the down sampledaudio signal from the input array of the same size as the input array onthe computing device of the transmitter using a Tukey window. The Tukeywindow may use a number of points matching the size of the section ofthe down sampled audio signal from the input array, which may be thesame number of points used by the Tukey window on the computing deviceof the transmitter. For example, the input array may store 3000 samplesof the down sampled digital audio signal with 2048 samples beingwindowed, and the Tukey window may use 2048 points. The Tukey window mayuse a ratio of 0.2. The down sampled digital audio may be windowed atintervals. The windowing of the down sampled digital audio signal fromthe input array may coincide with the receiving at the electronic deviceof a section of specified length of the multi-channel digital audiosignal. For example, once the electronic device beings receiving themulti-channel digital audio signal, the first windowing of the downsampled digital audio in the input array may occur after a 682.7 mssection of the multi-channel digital audio, and accompanying audiometadata, is received, which may be after 682.7 ms, as the multi-channeldigital audio signal may be streamed in real time. Subsequent windowingmay occur every 500 ms, after the receiving of 500 ms of themulti-channel digital audio signal and audio metadata for the 500 ms ofthe source audio signals on which the multi-channel digital audio signalis based and for the immediately previous 182.7 ms of the source audiosignals. The section of the down sampled audio signal from the inputarray that is windowed may be selected based on a current delay, forexample, as determined using a histogram. For example, if the currentdelay is 160 ms, the 2048 sample, 682.7 ms, section of the down sampledaudio signal from the input array may start at 157.3 ms into the inputarray, which may store 1000 ms worth of samples, and may end at 840 msinto the input array. The current delay may be set to any suitable valueinitially, such as, for example, 0 ms, or a value based on a knowndistance between the electronic device and a speaker.

The windowed digital audio signal may be transformed to the frequencydomain using a Fourier transform. For example, the electronic device mayimplement a 2048 point fast Fourier transform (FFT) on the windoweddigital audio signal to generate a DFT representation of the windoweddigital audio signal. The DFT representation of the windowed digitalaudio signal may include any suitable number of complex numbers. Forexample, the DFT representation generated by a 2048 point FFT mayinclude 2048 complex numbers. The DFT representation may be a frequencydomain representation of the windowed digital audio signal. The DFTrepresentation of the windowed digital audio signal may be normalized,for example, in the same manner as the DFT representation of thereversed digital audio signal on the computing device of thetransmitter.

The DFT representation of the windowed digital audio signal may bemultiplied by the DFT representation of the reversed digital audiosignal received as part of the audio metadata. For example, theelectronic device may implement an element-wise multiplication of theDFT representation of the windowed digital audio signal and the DFTrepresentation of the reversed digital audio signal, resulting in acorrelation result in the DFT representation. The multiplication of theDFT representations may correspond to a convolution of the time domainrepresentations used to generate the DFT representations, for example,the values of the input array of the electronic device and the reversedvalues of the input array of the computing device of the transmitter forthe windowed digital audio signal and the reversed digital audio signal.

The correlation result in the DFT representation may be phasetransformed. For example, phase transform (PHAT) weighting may beapplied to the correlation result in the DFT representation by theelectronic device. The PHAT weighting may divide each complex number ofthe correlation result in the DFT representation by its own absolutevalue. The frequencies represented in the resulting PHAT weighted DFTrepresentation may have their amplitudes set to 1 by the PHAT weighting,while phase data for each of the frequencies may be maintained.

The PHAT weighted DFT representation may be transformed to the timedomain. For example, the electronic device may implement an inverse FFTon the PHAT weighted DFT representation. The inverse FFT may generateany suitable number of samples for the time domain representation of thePHAT weighted DFT representation. For example, the inverse FFT maygenerate a number of samples corresponding to the number of samples inthe input arrays of the electronic device and the computing device ofthe transmitter, such as 2048 samples. The time domain representation ofthe PHAT weighted DFT representation may represent an audio signal thatwould be the result of convolving the windowed digital audio signal andthe reversed digital audio signal with amplitude information removed.

The time domain representation of the PHAT weighted DFT representationmay be searched for the sample with the greatest amplitude. For example,the electronic device may perform any suitable search on the values ofthe samples of the time domain representation of the PHAT weighted DFTrepresentation to determine which sample has the highest value,indicating the greatest amplitude. The position of the sample with thegreatest amplitude may indicate the amount by which the analog audiosignal generated by the microphone of the electronic device, and thesound from the speakers, is delayed compared to the multi-channeldigital audio signal received from the transmitter, relative to apreviously determined delay value. For example, if the sample with thegreatest amplitude is the first or last sample of the time domainrepresentation of the PHAT weighted DFT representation, this mayindicate that there is no delay relative to the current delay value. Forexample, if the sample with the greatest amplitude is the last sample ofthe time domain representation of the PHAT weighted DFT representationthe relative delay may be 0 ms. The value of the relative delay may bedetermined based on the location of the sample greatest amplitude. Forexample, if the sample with the greatest amplitude is located after the1^(st) sample and before the middle sample, for example, the 1024^(th)sample, the relative delay may be positive. Otherwise, if the samplewith the greatest amplitude is located after the middle sample, therelative delay may be negative. The magnitude of the relative delay mayincrease as the sample with the greatest amplitude approaches the middlesample of the time domain representation of the PHAT weighted DFTrepresentation. For example, the relative delay value may be determinedaccording to, for 1≥x≥S/2, (x−1)/F, and for S/2>x≥S, (x−S)/F, where S isthe total number of samples in the time domain representation of thePHAT weighted DFT representation, x is the sample number of the samplewith the greatest amplitude, and F is the sampling frequency in Hz ofthe time domain representation of the PHAT weighted DFT representation.If the domain representation includes more than one sample with thegreatest amplitude, any sample with the greatest amplitude may be chosento determine the relative delay.

The value of the relative delay may be added to a histogram. Forexample, the electronic device may include a controller which maycontrol the output of the multi-channel digital audio signal from theelectronic device to a sound generating device, such as headphonesconnected to a personal audio device. The histogram may include anysuitable number of bins, and each bin may represent a range of delayvalues, for example, in milliseconds. For example, the histogram may usean interval of 4 ms, and may have enough bins to represent the entirelength of the section of the source audio signals represented by theaudio metadata. The histogram may, for example, have 395 bins of 4 mseach, starting with a bin representing a delay of −80 ms to −76 ms, andending with a bin representing a delay of 1496 ms to 1500 ms. The valueof the relative delay may be weighted in any suitable manner beforebeing added to the histogram. For example, the value of the relativedelay may be weighted according to the value of the root mean squarefrom the audio metadata. The value of the relative delay may be weightedto 0, and therefore discarded, if the value of the root means squarefrom the audio metadata is less than 300, weighted to 1 if the value ofthe root mean square is 300 to 2000, and weighted to 2 if the value ofthe root mean square is above 2000. Discarded relative delay values mayhave been determined based on sections of the source audio signals whichcontain little or no sound, resulting in a very low root mean squarevalue and indicating a lack of activity from the audio sources.

Any suitable number of relative delay values may be used in thehistogram. For example, the histogram may use 26 relative delay values,which may be replaced on a first-in first-out basis. The values of therelative delays used in the histogram may be weighted for recency. Forexample, the histogram may use 26 relative delay values, with the 13most recent relative delay values weighted by a factor of 2 in additionto any weighting of those relative delay values based on the values oftheir corresponding root mean squares, and the 13 oldest relative delayvalues may be weighted by a factor of 1. For example, the most recentvalue for a relative delay to enter the histogram may have acorresponding root mean square of 2500, resulting in the value for therelative delay being weighted by a factor of 4. The histogram may alsouse any other suitable weightings for relative delay values. When arelative delay value is added to the histogram, the count for the bin ofthe histogram corresponding to the relative delay value may be increasedaccording to the weighting of the relative delay. For example, the mostrecent value for a relative delay to enter the histogram may be 2.5 ms,and may have a weighting of 4, resulting in the count for the 0 ms to 4ms bin of the histogram increasing by 4. As new relative delay valuesenter the histogram, counts from the oldest relative delay value, whichfalls out of the histogram, may be removed from the appropriate bin, andchanges in recency weightings may result in changes to the counts of anybins of the histogram.

The current delay may be adjusted based on the histogram. For example,the electronic device may determine an adjustment to the current delaybased on the counts for the various bins of the histogram and the delayvalues to which the bins correspond. The adjustment to the current delaymay be determined in any suitable manner. For example, the histogram binwith the highest count may be used as the adjustment to the currentdelay, based on, for example, the average delay represented by thathistogram bin. For example, a histogram may have a count of 16 in the 0ms to 4 ms bin, a count of 28 in the 4 ms to 8 ms bin, a count of 10 inthe 8 ms to 12 ms bin, and a count 6 in the 12 ms to 16 ms bin. Theadjustment to the current delay may be 6 ms, which may be the averagedelay of the 4 ms to 8 ms bin, which may have the highest count. If thecurrent delay was, for example, 40 ms, the current delay may be adjustedby 6 ms, to 46 ms. The current delay may also be adjusted downwards, forexample, if a bin representing a negative relative delay has the highestcount. For example, if the current delay is 40 ms, and a binrepresenting a delay from −8 ms to −4 ms has the highest count, thecurrent delay may be adjusted downwards by 6 ms, to 34 ms, based on theaverage of −6 ms of the bin with the highest count.

The current delay, as adjusted based on the adjustment determined by thehistogram, may be used to adjust the data from the input array of theelectronic device used when determining the next relative delay value.For example, the input array may store 3000 samples, of which 2048samples may be used to determine a relative delay value. The oldestsample, for example, the 3000^(th) sample of the input array mayrepresent 0 ms, and the new samples, for example, the 1^(st) sample ofthe input array, may represent 1000 ms. The samples in the input arraymay represent a total of 1000 ms of audio signal. When determining thenext relative delay value, the samples windowed using the Tukey windowmay be samples starting at the current delay value and going up to 682.7ms from the current delay value. For example, if the current delay valueis 0 ms, which it may be initially, the samples from the input arraythat are windowed may be from 0 ms, which may be the oldest sample inthe input array, for example, the 3000^(th) sample, to 682.7 ms, whichmay be the 1053^(nd) sample, for a total of 2048 samples. If the currentdelay value is 160 ms, the samples that are windowed may be from 160 ms,which may be the 2520^(th) sample, to 842.7 ms, which may be the473^(rd) sample. This may allow the relative delay value to bedetermined relative to the current delay value as adjusted based on thehistogram.

The current delay may be used to adjust the generation of sound by theelectronic device based on the multi-channel digital audio signalreceived from the transmitter. For example, the current delay mayindicate the amount of time by which the multi-channel digital audiosignal from the transmitter is ahead of the analog audio signal beinggenerated by the microphone based on sound from the speakers. Tosynchronize the sound a person using the electronic device hears fromthe speakers with the sound the person would hear using headphonesconnected to the electronic device, a digital audio signal based on themulti-channel digital audio signal from the transmitter, which is usedto generate sound through the headphones, may be delayed, or sped up,according to the current delay.

For example, the current delay may be output to an audio asynchronousresampler. The audio asynchronous resampler may, for example, resample astereo digital audio signal generated from the multi-channel digitalaudio signal received from the transmitter before the stereo digitalaudio signal is used to generate sound. The resampling may slow down theplayback of the stereo digital audio signal, for example, duplicatingsamples, in order to increase the amount of time over which a section ofthe stereo digital audio signal is used to generate sound. This maycause an increase in playback time for a section of the stereo digitalaudio signal, for example, causing a section that originally represented100 ms of audio to take 102 ms to playback, implementing a delay of 2ms. The audio asynchronous resampler may implement resampling to delaythe playback of the digital audio signal generated from themulti-channel digital audio signal in any suitable manner, and over anysuitable period of time. The audio asynchronous resampler may avoidimplementing too great of a delay over too short of a period of time, asthe resampling may cause audio artifacts or noticeable changes in pitchin the generated sound. The audio asynchronous resampler may also speedup the playback of the multi-channel digital audio signal, for example,dropping samples, and may also avoid implementing to great of a speed upover too short of a time period.

The current delay may be the total delay needed at the time the currentdelay is determined, and previous delays implemented by the audioasynchronous resampler may count toward this total. For example, thefirst current delay determined at the electronic device may be 10 ms.The audio asynchronous resampler may have delayed playback of the stereodigital audio signal by 5 ms when the current delay is adjusted to 9 ms.Because playback of the stereo digital audio signal is already delayedby 5 ms, the audio asynchronous resampler may only delay playback of thestereo digital audio signal by an additional 4 ms, bringing the totaldelay to 9 ms, matching the current delay. If the next current delay isnot 9 ms, for example, is 7 ms or 10 ms, the audio asynchronousresampler may delay or speed up playback of the stereo digital audiosignal accordingly, for example, implementing a 2 ms speed up to bringthe total delay to 7 ms, or implementing an additional 1 ms delay tobring the delay to 10 ms. The current delay may change, for example, dueto a person moving and changing the distance between the microphone ofthe electronic device and the nearest speaker, or due to changes in thetransmission environment for the transmitter. When the total delaymatches the most recent current delay, the audio asynchronous resamplermay allow the digital audio signal generated from the multi-channeldigital audio signal, for example, the stereo audio signal, to play backwithout delays or speedups.

The transmitter may be aware of the distance between the transmitter andthe closest speaker. The transmitter may use this distance to determinea delay used in transmitting the multi-channel digital audio signal andaudio metadata. This may reduce the delays, and the average delay,determined by the electronic device, and may allow for the soundgenerated by the electronic device to be synchronized with the soundgenerated by the nearest speaker more quickly. In some implementations,there may be multiple transmitters in a venue. Each transmitter may beaware of the distance between itself and the front of house speaker thatwould be closet to people within range of the transmitter, and may usethis distance to implement a delay in transmitting the multi-channeldigital audio signal and the audio metadata.

Any suitable timing data may be used by the transmitter and electronicdevice to ensure that the appropriate section of the analog audio signalgenerated by the microphone is used to generated the DFT representationon the electronic device that is multiplied with the DFT representationof the reversed digital audio signal received as part of the audiometadata. The multi-channel digital audio signal from the transmittermay be streamed continuously to the electronic device, as the audiosources may be live.

In some implementations, the audio sources may generate digital audiosignals. The digital source audio signals may not need to be initiallysampled by the transmitter, as they may already be digitally sampled.The digital source audio signals may still be combined and down sampledbefore being filtered.

In some implementations, the microphone of the electronic device may bea digital microphone that generates digital audio signals. The digitalaudio signal generated by the microphone may not need to be initiallysampled, as it may already be digitally sampled. The digital audiosignal may still be down sampled before being filtered.

The discrete Fourier transform representations may be in any suitableformat, and may use complex numbers, polar notation, or any othersuitable representation type.

The computing device of the transmitter and the receiver may each useany suitable combination of general and special purpose hardware andsoftware for signal processing. For example, the computing device mayuse general purpose central processing units (CPUs), graphics processingunits (GPUs), other special-purpose processors which may run softwarefor implementing various signal processing techniques, dedicatedhardware such as ADCs, digital to analog converters (DACs) hardwarefilters, field programmable gate arrays (FPGAs), or otherspecial-purpose hardware.

FIG. 1 shows an example system suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter. A venue may be any environment in which audio is played back,for example, from live sources, for an audience. The venue may includeaudio sources 101, audio processing 105, a transmitter 110, and speakers151, 152, 153, and 154. The receivers 120, 121, 122, and 123 may beelectronic devices used by persons within the venue, such as, forexample, as personal audio devices.

The audio sources 101 may be any suitable sources of audio signals. Forexample, the audio sources 101 may include any number of microphones orpickups to convert sound from instruments or vocalists to analog audiosignals. The audio sources 101 may include any number of synthesizers,computing devices, or other electric or electronic devices which maygenerate audio signals directly and which may not generate sound withouta speaker. The analog audio signals from the audio sources 101 may besource audio signals, and may be sent to the audio processing 101 andthe transmitter 110. For example, the source audio signals may be splitusing a suitable analog or digital splitter.

The transmitter 110 may be any suitable device or devices for processingaudio signals and transmitting data, including processed audio signals,wirelessly. For example, the transmitter 110 may include a computingdevice, such as, for example, a desktop, laptop, tablet, smartphone, orlocal or remote server, that may include software and hardware forprocessing digital audio signals. The computing device may be able toreceive analog audio signals, such as the separate or combined sourceaudio signals from the audio sources 101, as input. The transmitter 110may also include a transmission device. The transmission device may beany suitable device for the wireless transmission of data overdistances, and may be, for example, a standalone wireless router oraccess point connected to the computing device through any suitablewired or wireless connection, or may be a component of the computingdevice, such as a wireless card with any suitable wireless radio. Thetransmission device may use any suitable type of wireless communication,such as, for example Wi-Fi, Bluetooth, analog radio, or digital radio.The transmission device may have any suitable range. For example, asingle transmission device may have sufficient range to transmit data toany part of a venue, or may only be able to cover a portion of thevenue. The number of transmission devices within a venue may be based onthe size of the venue and the range of the transmission devices. In someimplementations, the transmitter 110 may include multiple transmissiondevices connected to a computing devices. In some implementations,multiple transmitters such as the transmitter 110, including computingdevices and transmission devices, may be distributed throughout thevenue and may be connected to the source audio signals.

The transmitter 110 may transmit the wireless signal 171. The wirelesssignal 171 may, for example, be a wireless broadcast signal according tothe wireless communication protocol used by the transmission device ofthe transmitter 110. The wireless signal 171 may carry data thatincludes the multi-channel digital audio signal generated from thesource audio signals and audio metadata generated by the computingdevice of the transmitter 110.

The audio processing 105 may include any suitable number and arrangementof any suitable components for processing audio signals, implemented inany suitable manner. For example, the audio processing 105 may include amixer, an equalizer, and an amplifier. The source audio signals receivedat the mixer may be mixed, for example, setting the relative volumes ofeach of the channels represented by the source audio signals. The mixermay combine the source audio signals from the audio sources 101 into amixed audio signal, which may include any suitable number of channels.The mixed audio signal may be analog or digital. For example, the mixermay be a digital mixer, and may covert input analog source audio signalsinto digital audio signals. The mixed audio signal may be input to theequalizer, which may adjust the volumes of the various sound frequenciesin the mixed audio signal. The mixed and equalized audio signal may beinput to an amplifier, which may amplify the audio signal and providethe amplified audio signal from the audio processing 105 to the speakers151, 152, 153, and 154 throughout the venue to be used to generatesound. For example, the speaker 151 may generate sound wave 161, thespeaker 152 may generate sound wave 162, the speaker 153 may generatesound wave 163, and the speaker 154 may generate sound wave 164, basedon the amplified audio signal provided by audio processing 105.

The source audio signals received at the transmitter 110 may beprocessed to generate audio metadata be sent with a multi-channeldigital audio signal to the receivers 120, 121, 122, and 123, within thevenue. For example, the computing device of the transmitter 110 maygenerate the multi-channel digital audio signal and the audio metadatafrom the source audio signals. The multi-channel digital audio signaland the audio metadata may be transmitted as data carried by thewireless signal 171.

The receivers 120, 121, 122, and 123 may be any suitable electronicdevices for receiving the wireless signal 171 and for generating soundusing the multi-channel digital audio signal from the wireless signal171. For example, the receivers 120, 121, 122, and 123 may be personalaudio devices, such as smartphones, tablets, or dedicated audio players,used by persons in the venue, and may include a wireless communicationsdevice, such as, for example, a Wi-Fi radio, for receiving the wirelesssignal 171 and for communicating wirelessly with the transmitter 110.The receivers 120, 121, 122, and 123 may generate sound based on thereceived multi-channel digital audio signal. For example, the receivers120, 121, 122, and 123 may include any suitable combination of hardwareand software for driving a speaker based on a digital audio signal.Headphones, which may be wired or wireless, may be connected to thereceivers 120, 121, 122, and 123 and may be worn by persons using thereceivers 120, 121, 122, and 123. The receivers 120, 121, 122, and 123may allow persons using them to manipulate the multi-channel digitalaudio signal. For example, the person may be able to adjust the mixingof the channels and equalization of the frequencies in the multi-channeldigital audio signal on their one of the receivers 120, 121, 122, and123. This may allow, for example, each of the persons to change therelative volumes of instruments and vocals, or emphasize or deemphasizefrequencies, in the multi-channel digital audio signal, changing thesound generated by the headphones connected to their one of thereceivers 120, 121, 122, and 123 using the multi-channel digital audiosignal. The multi-channel digital audio signal may be buffered orotherwise held in any suitable memory of the receivers 120, 121, 122,and 123 as it is received, for example, through the wireless signal 171,and before the multi-channel digital audio signal is used to generatesound. This may allow each of the receivers 120, 121, 122, and 123 tospeed up or delay playback of the multi-channel digital audioindependently of each other.

Each of the each of the receivers 120, 121, 122, and 123 may determineits own delay. The delay for an individual one of the receivers 120,121, 122, and 123 may be, for example, the amount of time it takes oneof the sound waves 161, 162, 163, or 164 generated by the one of thespeakers 151, 152, 153, and 154 and based on the same section of thesource audio signals as audio metadata, to arrive at the location of thereceiver after the arrival of the multi-channel digital audio convertedfrom the same section of the source audio signals may be determinedbased on the audio metadata. The sound wave for which the delay isdetermined may be, for example, the loudest sound wave at the locationof the receiver, which may, for example, be generated by the nearestspeaker. For example, the receiver 120 may be nearest to the speaker151, and the sound wave 161 may arrive at the location of the receiver120 before any other sound wave from any other speaker in the venue andmay be loudest sound wave at the location of the receiver 120. The delaydetermined by the receiver 120 may be based on the sound wave 161. Insome instances, the loudest sound wave at the location of a receiver maynot be generated by the nearest speaker, for example, due toobstructions. The locations of the receivers 120, 121, 122, and 123 maychange as the persons using them move around the venue, changing whichof the sound waves 161, 162, 163, and 164 is loudest at the location ofeach of the receivers 120, 121, 122, and 123.

FIG. 2 shows an example system suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter. The transmitter 110 may include a computing device and atransmission device, which may include hardware and software componentsfor the processing of audio signals and the transmission of data. Thetransmitter may include a sampler 201. The sampler 201 may be anysuitable combination of hardware and software for sampling an analogaudio signal to generate a digital audio signal, such as, for example,an ADC or multiple ADCs. The sampler 201 may sample the source audiosignals, which may be analog signals generated by the audio sources 101.For example, the sampler 201 may sample the source audio signals, whichmay be, for example, combined into a combined analog signal, at 48 kHz,and may generate a digital audio signal. The combined analog audiosignal may be sampled continuously as it arrives at the sampler 201 fromthe audio sources 101, generating a continuous digital audio signal. Thesampler 201 may be able to process multiple channels, for example,through a multi-channel ADC or through multiple ADCs. For example, inaddition to generating the digital audio signal from the combined analogsignal, the sampler 201 may continuously generate a multi-channeldigital audio signal, which may include a separate channel for each ofthe source audio signals, and may be sampled at any suitable rate. Insome implementations, the sampler 201 may include a separate ADC foreach of the source audio signals, and may combine the digitalconversions of the source audio signals into a single digital audiosignal. The sampler 201 may be a component of the computing device ofthe transmitter 110.

The transmitter 110 may include an anti-aliasing filter 202. Theanti-aliasing filter 202 may be any suitable combination of hardware andsoftware for filtering a digital audio signal. The anti-aliasing filter202 may be, for example, a hardware filter, or a software-implementedfilter. The sampler 201 may output the digital audio signal convertedfrom the combined analog audio signal to the anti-aliasing filter 202.The anti-aliasing filter 202 may filter the digital audio signal withany suitable parameters to filter portions of the digital audio signalcontinuously as they are generated by the sampler 201. For example, theanti-aliasing filter 202 may use a stop band frequency of 1500 Hz, aripple of 1 dB, a stop band of −50 dB, and a pass band of 1150 Hz. Theanti-aliasing filter 202 may continuously filter the digital audiosignal and continuously output a filtered digital audio signal. Theanti-aliasing filter 202 may be a component of the computing device ofthe transmitter 110.

The transmitter 110 may include a down sampler 203. The down sampler 203may be any suitable combination of hardware and software for downsampling a digital audio signal. The down sampler 203 may be, forexample, a hardware or software-implemented down sampler. Theanti-aliasing filter 202 may output the filtered digital audio signal tothe down sampler 203. The down sampler 203 may down sample the filtereddigital audio signal any suitable number of times. For example, thefiltered digital audio signal may be down sampled by a factor of 16,resulting in a 3 kHz down sampled digital audio signal. The down sampler203 may continuously down sample continuous input from the anti-aliasingfilter 202 and continuously output the down sampled digital audiosignal. The down sampler 203 may be a component of the computing deviceof the transmitter 110.

The transmitter 110 may include an input array. The input array 204 maybe implemented in storage of the computing device of the transmitter 110using a data structure of any suitable size. For example, the inputarray 204 may be a 2048 element array, and each element may store one of2048 samples. The samples stored in the input array 204 may representany suitable length of the down sampled digital audio signal. Forexample, the samples stored in the input array may represent 682.7 ms ofthe down sampled digital audio signal. The input array 204 may be storedon the computing device in any suitable manner, in any suitable storagehardware, including volatile and non-volatile storage. The down sampleddigital audio signal continuously output from the down sampler 203 maybe continuously stored in the input array 204, for example, on afirst-in first-out basis, with new samples pushing down older samplesand the newest sample causing the oldest sample to exit the input array204. The input array 204 may be implemented as a first-in first-outqueue. When the input array 204 stores 2048 values, each representing asample, the input array 204 may represent 682.7 ms of the combinedanalog audio signal at a sampling rate of 3 kHz, down sampled by thedown sampler 203 from 32768 samples representing the same 682.7 ms ofthe combined analog audio signal at a sampling rate of 48 kHz as sampledby the sampler 201. Samples stored in the input array 204 for the downsampled digital audio signal may be output, or accessed, at specifiedintervals.

The transmitter 110 may include an RMS determiner 205. The RMSdeterminer 205 may be any suitable combination of hardware and softwarefor determining the RMS a digital audio signal. The RMS determiner 205may be, for example, implemented as a hardware device, or may beimplemented in software on the computing device of the transmitter 110.The RMS determiner 205 may determine the RMS of the down sampled digitalaudio signal, as stored in the input array 204. For example, the RMSdeterminer 205 calculate the value of the RMS of the down sampleddigital audio signal based on the 2048 values stored in the input array204. The value of the RMS may be determined at intervals, such as forexample, at 682.7 ms when the input array 204 is initially filled withthe down sampled digital audio signal output from the down sampler 203,and once every 500 ms thereafter, reusing samples representing 182.7 msof samples from the input array 204 that were used in the previousdetermination of the value of the RMS. The RMS determiner 205 may, forexample, access the samples stored in the input array 204, or may waitand be sent the samples stored in the input array 204, at theappropriate intervals. The value of the RMS may be output by the RMSdeterminer 205 to be stored, for example, in a transmission buffer 209,and may be part of audio metadata for the portion of the source audiosignals, for example, the 682.7 ms of the source audio signals, thatwere combined, sampled, filtered, and down sampled to produce the downsampled digital audio signal stored in the input array 204. The RMSdeterminer 205 may be a component of the computing device of thetransmitter 110.

The transmitter 110 may include Tukey window 206. The Tukey window 206may be any suitable combination of hardware and software for windowing adigital audio signal. The Tukey window 206 may be, for example,implemented as a hardware device, or may be implemented in software onthe computing device of the transmitter 110. The Tukey window 206 may beused to window the down sampled digital audio signal from the inputarray 204. The Tukey window 206 may use a number of points matching thesize of the input array 204. For example, the input array 204 may store2048 samples for the down sampled digital audio signal, and the Tukeywindow 206 may use 2048 points. The Tukey window 206 may use a ratio of0.2. The Tukey window 206 may window the digital audio signal atspecified intervals, such as for example, at 682.7 ms when the inputarray 204 is initially filled with the down sampled digital audio signaloutput from the down sampler 203, and once every 500 ms thereafter,reusing samples representing 182.7 ms of samples from the input array204 that were previously windowed by the Tukey window 206. The Tukeywindow 206 may operate in synchronization with the RMS determiner 205,so that the both the Tukey window 206 and the RMS determiner 205 use thesame data from the input array 204, for example, representing the samesection of the source audio signals. The Tukey window 206 may output awindowed digital audio signal. The Tukey window 206 may be a componentof the computing device of the transmitter 110.

The transmitter 110 may include a data flipper 207. The data flipper 207may be any suitable combination of hardware and software for flippingthe data for a digital audio signal. The data flipper 207 may be, forexample, implemented as a hardware device, or may be implemented insoftware on the computing device of the transmitter 110, and may be, forexample, instructions for memory mapping. The data flipper 207 may beused to flip the windowed digital audio signal output by the Tukeywindow 206. The data flipper 207 may reverse the order of the individualsamples of the windowed digital audio signal, for example, throughmemory mapping. The data flipper 207 may output a reversed digital audiosignal. The reversed digital audio signal may be the windowed digitalaudio signal backwards. The data flipper 207 may be a component of thecomputing device of the transmitter 110.

The transmitter 110 may include a fast Fourier transform 208. The fastFourier transform 208 may be any suitable combination of hardware andsoftware for performing an FFT on a digital audio signal to generate aDFT for the digital audio signal. The fast Fourier transform 208 may be,for example, implemented as a hardware device, or may be implemented insoftware, such as signal processing software, on the computing device ofthe transmitter 110. The fast Fourier transform 208 may transform thereversed digital audio signal output by the data flipper 207 to thefrequency domain using a Fourier transform. For example, the fastFourier transform 208 may implement a 2048 point FFT on the reverseddigital audio signal, generating a DFT representation of the reverseddigital audio signal. The reversed digital audio signal may bezero-padded. The DFT representation of the reversed digital audio signalmay include any suitable number of complex numbers. For example, the DFTrepresentation generated by a 2048 point FFT may include 2048 complexnumbers. The DFT representation may be a frequency domain representationof the reversed digital audio signal. The fast Fourier transform 208 mayoutput the DFT representation, or section of the DFT representation, tobe stored in the transmission buffer 209 along with the value of theroot mean square out by the RMS determiner 205. The DFT representationmay be normalized by dividing each component of each complex number inthe DFT representation by the magnitude of the real or imaginarycomponent with the greatest magnitude of any components of any of thecomplex numbers in the DFT representation. The fast Fourier transform208 may generate the DFT representation at the same intervals that theTukey window 206 windows the down sampled digital audio signal in theinput array 204 and the data flipper 207 generates the reversed digitalaudio signal, for example, after 682.7 ms initially, and then every 500ms thereafter. The fast Fourier transform 208 may be a component of thecomputing device of the transmitter 110.

The transmitter 110 may include the transmission buffer 209. Thetransmission buffer 209 may be any suitable combination of hardware andsoftware for storing audio metadata before the audio metadata istransmitted by the transmission device of the transmitter 110. Thetransmission buffer may be, for example, any suitable data structurestored in any suitable volatile or non-volatile storage of the computingdevice or the transmission device of the transmitter 110. Thetransmission buffer 209 may receive and store the value of the root meansquare output by the RMS determiner 205 and the DFT representation, orsection of the DFT representation, of the reversed digital audio signalgenerated using the fast Fourier transform 208. The value of the rootmean square output and the DFT representation of the reversed digitalaudio signal may be received at the transmission buffer at specifiedintervals, for example, initially after 682.7 ms plus the processingtime needed to generate them, and then every 500 ms thereafter. Thevalue of the root mean square and the DFT representation of the reverseddigital audio signal stored in the transmission buffer 209 at a giventime may have been determined from the same section of the source audiosignals, and may represent audio metadata for that section of the sourceaudio signals. The transmission buffer 209 may be a component of thecomputing device or the transmission device of the transmitter 110.

The transmission device of the transmitter 110 may transmit the audiometadata stored in the transmission buffer 209, for example, using thewireless signal 171. The wireless signal 171 may be a radio signal ofany suitable type, including, for example, a Wi-Fi signal, and may carrythe DFT representation of the reversed digital audio signal and thevalue of the root mean square that were stored in the transmissionbuffer 209. The audio metadata may be transmitted along withmulti-channel digital audio signal generated from the same section ofthe source audio signals that was used to generate the audio metadata.For example, the multi-channel digital audio signal, output from thesampler 201, and including digital conversions of all of the sourceaudio signals and channel information for the source audio signals forthe same 682.7 ms section of the source audio signals that were combinedand used to generate the DFT representation and for which the value ofthe root mean square was determined, may be transmitted along with theaudio metadata. The multi-channel digital audio signal and the audiometadata may be transmitted separately using separate wirelesscommunication channels, bandwidth, or frequency. Alternatively, themulti-channel digital audio signal and the audio metadata may bemodulated or multiplexed together and transmitted using a singlecommunication channel, bandwidth or frequency. For example, themulti-channel digital audio signal and the audio metadata may be encodedusing a Quadrature Amplitude Modulation (QAM) technique, such as 16-bitQAM.

The audio metadata may be generated, stored in the transmission buffer209, and transmitted at any suitable rate. For example, while thesampling, alias filtering, and down sampling of the combined analogaudio signal may be continuous, the audio metadata may be generatedinitially after 682.7 ms, and then once every 500 ms. The firsttransmission may not occur until after 682.7 ms of the combined analogaudio signal have been sampled in order to ensure the input array 204 isfilled with samples, after which subsequent audio metadata may begenerated and transmitted every 500 ms based on a set of samples thatincludes some samples used to generate the immediately previous audiometadata. For example, the last 182.7 ms of samples used to generate theprevious audio metadata may be used to generate the subsequent audiometadata. The reused samples may remain in the input array 204 to bereused, as they may not yet have been pushed out of the input array.

The receiver 120 may receive the audio metadata and multi-channeldigital audio signal from the transmitter 110. For example, the receiver120 may receive the wireless signal 171, which may carry the audiometadata from the transmission buffer 209 and the multi-channel digitalaudio signal output by the sampler 201. The receiver 120 may generatesound based on the received multi-channel digital audio signal. Thereceiver 120 may allow for the manipulation of the multi-channel digitalaudio signal, including mixing and equalization of the multi-channeldigital audio, using any suitable interface, such as, for example, atouchscreen, or other input devices. The multi-channel digital audiosignal may be buffered or otherwise held in any suitable memory of thereceiver 120 as it is received and before being used to generate sound.This may allow playback of the multi-channel digital audio signal by thereceiver 120 to be delayed or sped up.

The receiver 120 may include a microphone 210. The microphone 210 may beany suitable hardware device for generating an analog audio signal basedon based on sounds in the surrounding environment. The microphone 210may generate an analog audio signal based on sounds, such as the soundwaves 161, 162, 163, and 164, from the speakers 151, 152, 153, and 154.The sound waves 161, 162, 163, and 164 from the speakers 151, 152, 153,and 154 may be generated based on the audio signal sent to the speakers151, 152, 153, and 154, for example, from the amplifier of the audioprocessing 105, and may be based on the copies of the source audiosignals that were sent to the mixer of the audio processing 105 from theaudio sources 101. The loudest components of the analog sound signalgenerated by the microphone 210 may be from the sound wave from thespeaker closest to the receiver 120, for example, the sound wave 161from the speaker 151, reinforced by the sound waves 162, 163, and 164when they arrive in synch with the sound wave 161 and constructivelyinterfere at the location of the microphone 210. The microphone maycontinuously generate and output the analog sound signal based on soundarriving at the location of the microphone 210.

The receiver 120 may include a sampler 211. The sampler 211 may be anysuitable combination of hardware and software for sampling an analogaudio signal to generate a digital audio signal, such as, for example,an ADC or multiple ADCs. The sampler 211 may sample the analog audiosignal generated by the microphone 210. The sampler 211 may sample theanalog audio signal at any suitable rate, such as, for example, at thesame 48 kHz rate as the sampler 201, and may generate a digital audiosignal. The sampler 211 may be able to process multiple channels, forexample, through a multi-channel ADC or through multiple ADCs. Forexample, receiver 120 may include more than one microphone, resulting inthe generation of more than one analog sound signal. The sampler 211 maycontinuously generate and output a digital audio signal based on thecontinuously input analog sound signal from the microphone 210.

The receiver 120 may include an anti-aliasing filter 212. Theanti-aliasing filter 212 may be any suitable combination of hardware andsoftware for filtering a digital audio signal. The anti-aliasing filter212 may be, for example, a hardware filter, or a software-implementedfilter. The sampler 211 may output the digital audio signal convertedfrom the analog audio signal generated by the microphone 210 to theanti-aliasing filter 212. The anti-aliasing filter 212 may filter thedigital audio signal with any suitable parameters to filter portions ofthe digital audio signal continuously as they are generated by thesampler 211. For example, the anti-aliasing filter 212 may use a stopband frequency of 1500 Hz, a ripple of 1 dB, a stop band of −50 dB, anda pass band of 1150 Hz. The anti-aliasing filter 212 may continuouslyfilter the digital audio signal and continuously output a filtereddigital audio signal.

The receiver 120 may include a down sampler 213. The down sampler 213may be any suitable combination of hardware and software for downsampling a digital audio signal. The down sampler 213 may be, forexample, a hardware or software-implemented down sampler. Theanti-aliasing filter 212 may output the filtered digital audio signal tothe down sampler 213. The down sampler 213 may down sample the filtereddigital audio signal any suitable number of times, such as, for example,the same number of time as the down sampler 203. For example, thefiltered digital audio signal may be down sampled by a factor of 16,resulting in a 3 kHz down sampled digital audio signal. The down sampler213 may continuously down sample continuous input from the anti-aliasingfilter 212 and continuously output the down sampled digital audiosignal.

The receiver 120 may include an input array. The input array 214 may beimplemented in storage of the receiver 120 using a data structure of anysuitable size. The input array 214 may be larger than the input array204. For example, the input array 214 may be a 3000 element array, andeach element may store one of 3000 samples. The samples stored in theinput array 214 may represent any suitable length of the down sampleddigital audio signal. For example, the samples stored in the input arraymay represent 1000 ms of the down sampled digital audio signal. Theinput array 214 may be stored on the receiver 120 in any suitablemanner, in any suitable storage hardware, including volatile andnon-volatile storage. The down sampled digital audio signal continuouslyoutput from the down sampler 213 may be continuously stored in the inputarray 214, for example, on a first-in first-out basis, with new samplespushing down older samples and the newest sample causing the oldestsample to exit the input array 214. The input array 214 may beimplemented as a first-in first-out queue. When the input array 214stores 3000 values, each representing a sample, the input array 214 mayrepresent 1000 ms of the analog audio signal generated by the microphone210 at a sampling rate of 3 kHz, down sampled by the down sampler 213from 48000 samples representing the same 1000 ms of the same audiosignal at 48 kHz. Samples stored in the input array 214 for the downsampled digital audio signal may be output, or accessed, at specifiedintervals.

The receiver 120 may include Tukey window 215. The Tukey window 215 maybe any suitable combination of hardware and software for windowing adigital audio signal. The Tukey window 215 may be, for example,implemented as a hardware device, or may be implemented in software onthe receiver 120. The Tukey window 215 may be used to window the downsampled digital audio signal from the input array 214. The Tukey window215 may use a number of points matching the size of the input array 204,which may be smaller than the input array 214. For example, the inputarray 204 may store 2048 samples for the down sampled digital audiosignal, and the Tukey window 215 may use 2048 points to window 2048samples from the input array 214, which may store 3000 samples. Acurrent delay may be used to determine the samples from the input array214 windowed by the Tukey window. For example, with a current delay of160 ms, samples from the input array 214 representing a section of thedown sampled digital audio signal from 160 ms to 842.7 ms may be used.The last sample in the input array 214, for example, the 3000^(th), mayrepresent the down sampled digital audio signal at 0 ms. The 1^(st)sample in the input array 214 may represent the audio signal at 1000 ms.The Tukey window 215 may use a ratio of 0.2. The Tukey window 215 maywindow the down sampled digital audio signal at specified intervals. Forexample, the windowing of the down sampled digital audio signal from theinput array 214 may coincide with the receiving at the receiver 120 of asection of specified length of the multi-channel digital audio signalfrom the transmitter 110. For example, once the receiver 120 beingsreceiving the multi-channel digital audio signal, the first windowing ofthe down sampled digital audio in the input array 214 may occur after a682.7 ms section of the multi-channel digital audio, and accompanyingaudio metadata, is received, which may be after 682.7 ms, as themulti-channel digital audio signal may be streamed from the transmitter110 in real time. Subsequent windowing may occur every 500 ms, after thereceiving of 500 ms of the multi-channel digital audio signal and audiometadata generated based on the 500 ms of the source audio signals thatwere used to generate the 500 ms of the multi-channel digital audiosignal along with the immediately previous 182.7 ms of the source audiosignals. Subsequent windowing, after the initial windowing, may reusesamples if necessary based on the interval between each windowing andthe length of the section of the down sampled digital audio signal thatis windowed, for example, using 182.7 ms from the input array 214 thatwere previously windowed by the Tukey window 215. The Tukey window 215may output a windowed digital audio signal. The Tukey window 215 may bea component of the computing device of the transmitter 110.

The receiver 120 may include a fast Fourier transform 216. The fastFourier transform 216 may be any suitable combination of hardware andsoftware for performing an FFT on a digital audio signal to generate aDFT for the digital audio signal. The fast Fourier transform 216 may be,for example, implemented as a hardware device, for example, built-in toa processor, or may be implemented in software, such as signalprocessing software, on the receiver 120. The fast Fourier transform 216may transform the windowed digital audio signal output by the Tukeywindow 215 to the frequency domain using a Fourier transform. Forexample, the fast Fourier transform 215 may implement a 2048 point FFTon the reversed digital audio signal, generating a DFT representation ofthe windowed digital audio signal. The DFT representation of thewindowed digital audio signal may include any suitable number of complexnumbers. For example, the DFT representation generated by a 2048 pointFFT may include 2048 complex numbers. The DFT representation may be afrequency domain representation of the windowed digital audio signal.The fast Fourier transform 216 may output the DFT representation of thewindowed audio signal. The DFT representation may be normalized in thesame manner as the DFT representation on the transmitter 110. The fastFourier transform 216 may generate the DFT representation at the sameintervals that the Tukey window 215 windows the down sampled digitalaudio signal in the input array 214, as the fast Fourier transform 216may operate at each instance it receives output from the Tukey window215.

The receiver 120 may include a multiplier 217. The multiplier 217 may beany suitable combination of hardware and software for performing anelement-wise multiplication on DFT representations. The multiplier 217may be, for example, implemented as a hardware device, for example,built-in to a processor, or may be implemented in software on thereceiver 120. The multiplier 217 may receive as input the DFTrepresentation of the windowed digital audio signal from the fastFourier transform 216 and the DFT representation of the reversed digitalaudio signal from the audio metadata received from the transmitter 110.The multiplier 217 may implement an element-wise multiplication of theDFT representation of the windowed digital audio signal and the DFTrepresentation of the reversed digital audio signal, resulting in acorrelation result in the DFT representation. The multiplication of theDFT representations may correspond to a convolution of the time domainrepresentations used to generate the DFT representations, for example,the values of the input array 214 of the receiver 120 and the reversedvalues of the input array 204 of the computing device of the transmitter110 for the windowed digital audio signal and the reversed digital audiosignal.

The receiver 120 may include PHAT weighting 218. The PHAT weighting 218may be any suitable combination of hardware and software for performingan element-wise multiplication on DFT representations. The PHATweighting 218 may be, for example, implemented as a hardware device, forexample, built-in to a processor, or may be implemented in software onthe receiver 120. The PHAT weighting 218 may phase transform thecorrelation result in the DFT representation output by the multiplier217. The PHAT weighting 218 may implement the phase transform weightingby dividing each complex number of the correlation result in the DFTrepresentation by its own absolute value, generating a PHAT weighted DFTrepresentation. The frequencies represented in the resulting PHATweighted DFT representation may have their amplitudes set to 1, whilephase data for each of the frequencies may be maintained.

The receiver 120 may include an inverse fast Fourier transform 219. Theinverse fast Fourier transform 219 may be any suitable combination ofhardware and software for performing an inverse FFT on a DFTrepresentation of an audio signal, generating a digital audio signal.The inverse fast Fourier transform 219 may be, for example, implementedas a hardware device, for example, built-in to a processor, or may beimplemented in software, such as signal processing software, on thereceiver 120. The inverse fast Fourier transform 219 may receive asinput the PHAT weighted DFT representation output by the PHAT weighting218, and may transform the PHAT weighted DFT representation to the timedomain. The inverse fast Fourier transform 219 may generate any suitablenumber of samples for the time domain representation of the PHATweighted DFT representation. For example, the inverse fast Fouriertransform 219 may generate a number of samples corresponding to thenumber of samples in the input array 204 and 214 of the computing deviceof the transmitter 110 and the receiver 120, such as 2048 samples. Thetime domain representation of the PHAT weighted DFT representation mayrepresent a signal that would be the result of convolving the windoweddigital audio signal and the reversed digital audio signal withamplitude information removed.

The receiver 120 may include delay search 220. The delay search 220 maybe any suitable combination of hardware and software for searching adigital audio signal generated from an inverse fast Fourier transform ofmultiplied DFT representations for an amplitude spike to determine adelay between the audio signals used to generate the multiplied DFTrepresentations relative to a current delay. The delay search 220 maybe, for example, implemented as a hardware device, for example, built-into a processor, or may be implemented in software, such as signalprocessing software, on the receiver 120. The delay search 220 maysearch the time domain representation of the PHAT weighted DFTrepresentation for the sample with the greatest amplitude. For example,the delay search 220 may perform any suitable search on the values ofthe samples of the time domain representation of the PHAT weighted DFTrepresentation to determine which sample has the highest value,indicating the greatest amplitude. The position of the sample with thegreatest amplitude may indicate the amount by which the analog audiosignal generated by the microphone 210 of the receiver 120, and theloudest sound wave, for the sound wave 161, from the speakers, forexample, the speaker 161, is delayed compared to the multi-channeldigital audio signal received from the transmitter 110, relative to thecurrent delay. For example, if the sample with the greatest amplitude isthe first or last sample of the time domain representation of the PHATweighted DFT representation, this may indicate that there is no delayrelative to the current delay value. The value of the relative delay maybe based on whether the sample with the greatest amplitude is in thefirst half of samples of the time domain representation of the PHATweighted DFT, indicating a positive relative delay value, or the secondhalf, indicating a negative relative delay value. The magnitude of therelative delay value, whether positive or negative, may increase as thesample with greatest amplitude approaches the middle sample. If the timedomain representation includes more than one sample with the greatestamplitude, any sample with the greatest amplitude may be chosen todetermine the delay. The determined relative delay may represent, forexample, the amount in addition to the amount indicated by the currentdelay of the section of the source audio signals used to generate theaudio metadata which was not received as sound generated by any of thespeakers 151, 152, 153, and 154 at the location of the receiver 120 bythe time the entire section has been received at the receiver 120 fromthe transmitter, for example, as a multi-channel digital audio signal.For example, a current delay of 10 ms and a relative delay of 10 ms mayindicate that at the time the first 682.7 ms of the multi-channeldigital audio signal, based on the first 682.7 ms section of the sourceaudio signals, was received at the receiver 120, only 662.7 ms of soundgenerated by the speakers 151, 152, 153, and 154 based on that firstsection of the source audio signals has been received at the location ofthe receiver 120 and used by the microphone 210 to generate an audiosignal. The delay search may output the value of the relative delay, ormay output an indication of the sample with the greatest amplitude,which may be used by another component of the receiver 120 to determinethe relative delay.

The receiver 120 may include a controller 221. The controller 221 may beany suitable combination of hardware and software for determining acurrent delay that may be used to control playback of a digital audiosignal by the receiver 120. The controller 221 may be, for example,implemented as a hardware device, for example, a processor or part of aprocessor, or may be implemented in software on the receiver 120. Thecontroller 221 may implement a histogram. The value of the relativedelay output by the delay search 220 may be added to a histogram. Thehistogram may include any suitable number of bins, and each bin mayrepresent a range of delay values, for example, in milliseconds. Forexample, the histogram may use an interval of 4 ms, and may have enoughbins to represent the entire length of the section of the source audiosignals represented by the audio metadata received by the receiver 120.For example, the histogram may include 395 bins, representing delaysfrom −80 ms to 1500 ms, with each bin representing 4 ms The value of therelative delay may be weighted in any suitable manner before being addedto the histogram. For example, the value of the relative delay may beweighted according to the value of the root mean square from the audiometadata. The value of the relative delay may be weighted to 0, andtherefore discarded, if the value of the root means square from theaudio metadata is less than 300, weighted to 1 if the value of the rootmean square is 300 to 2000, and weighted to 2 if the value of the rootmean square is above 2000. Discarded relative delay values may have beendetermined based on sections of the source audio signals which containlittle or no sound, resulting in a very low root mean square value, andindicating a lack of activity from the audio sources.

Any suitable number of relative delay values may be used in thehistogram. For example, the histogram may use 26 relative delay values,which may be replaced on a first-in first-out basis. The relative delayvalues used in the histogram may be stored on the receiver 120 in anysuitable manner, in any suitable volatile or non-volatile storage whichmay be accessible to the controller 221. The values of the relativedelays used in the histogram may be weighted for recency. For example,the histogram may use 26 relative delay values, with the 13 most recentrelative delay values weighted by a factor of 2 in addition to anyweighting of those relative delay values based on the values of theircorresponding root mean squares, and the 13 oldest relative delay valuesmay be weighted by a factor of 1 in addition to any weighting of thoserelative delay values based on the values of their corresponding rootmean squares. For example, the most recent value for a relative delay toenter the histogram may have a corresponding root mean square of 2500,resulting in the value for the relative delay being weighted by a factorof 4. The histogram may also use any other suitable weightings forrelative delay values. When a relative delay value is added to thehistogram, the count for the bin of the histogram corresponding to therelative delay value may be increased according to the weighting of therelative delay. For example, the most recent value for a relative delayto enter the histogram may be 2.5 ms, and may have a weighting of 4,resulting in the count for the 0 ms to 4 ms bin of the histogramincreasing by 4. As new relative delay values enter the histogram,counts from the oldest relative delay value, which falls out of thehistogram, may be removed from the appropriate bin, and changes inrecency weightings may result in changes to the counts of any bins ofthe histogram.

The controller 220 may use the histogram to determine an adjustment tothe current delay. For example, the controller 220 may determine anadjustment to the current delay based on the counts for the various binsof the histogram and the delay values to which the bins correspond. Theadjustment to the current delay may be determined in any suitablemanner. For example, the histogram bin with the highest count may beused as the adjustment to the current delay, based on, for example, theaverage delay represented by that histogram bin. For example, ahistogram may have a count of 16 in the 0 ms to 4 ms bin, a count of 28in the 4 ms to 8 ms bin, a count of 10 in the 8 ms to 12 ms bin, and acount 6 in the 12 ms to 16 ms bin. The adjustment to the current delaymay be may be 6 ms, based on the 4 ms to 8 ms bin having the highestcount of any bin in the histogram. The controller 220 may output theadjustment to the current delay, which may be used to adjust the currentdelay.

The current delay, as adjusted based on the adjustment determined by thehistogram, may be used to adjust the data from the input array 214 usedwhen determining the next relative delay value. For example, the inputarray 214 may store 3000 samples, of which 2048 samples are used todetermine a relative delay value. The samples may represent 1000 ms ofaudio signal. When determining the next relative delay value, thesamples windowed using the Tukey window 215 may be samples starting atthe current delay value and going up to 682.7 ms from the current delayvalue. For example, if the current delay value is 0 ms, which it may beinitially, the samples from the input array that are windowed may befrom 0 ms, which may be the oldest sample in the input array, to 682.7ms. If the current delay value is 160 ms, the samples that are windowedmay be from 160 ms to 842.7 ms. This may allow the relative delay valueto be determined relative to the current delay value as adjusted basedon the histogram.

The receiver 120 may include an audio processor 222. The audio processor222 may be any suitable combination of hardware and software forprocessing a multi-channel digital audio signal. The audio processor 222may be, for example, implemented as a hardware device, for example, aspecial purpose hardware device, or may be implemented in software onthe receiver 120. The audio processor 222 may receive the multi-channeldigital audio signal from the transmitter 110, and may process themulti-channel digital audio signal in any suitable manner. For example,the audio processor 222 may mix and equalize the multi-channel digitalaudio signal based on preset mixing instructions or equalizationsettings, or based on input from a user of the receiver 120. The audioprocessor 222 may combine channels in the multi-channel digital audiosignal, for example, to produce a stereo digital audio signal that maybe suitable for playback over a two-channel sound generating device,such as a pair headphones. The audio processor 222 may output thedigital audio signal, for example, the stereo digital audio signal,generated by processing the multi-channel digital audio signal.

The receiver 120 may include a resampler 223. The resampler 223 may beany suitable combination of hardware and software for determiningresampling a digital audio signal. The resampler 223 may be, forexample, implemented as a hardware device, for example, a specialpurpose hardware device such as an audio asynchronous resampler, or maybe implemented in software on the receiver 120. The resampler 223 mayreceive the current delay from the controller 221, for example, asadjusted based on the histogram, and may receive the digital audiosignal, for example the stereo digital audio signal, output by the audioprocessor 222. The resampler 223 may use the current delay to adjust thegeneration of sound by the receiver 120 based on the multi-channeldigital audio signal received from the transmitter 110. For example, thecurrent delay may indicate the amount of time by which the multi-channeldigital audio signal from the transmitter 110 is ahead of the analogaudio signal being generated by the microphone 210 based on the soundwave 161 from the speaker 151. To synchronize the sound a person usingthe receiver 120 hears from the speaker 151 with the sound the personwould hear using headphones connected to the receiver 120, the digitalaudio signal output by the audio processor 222 which is used to generatesound through the headphones may be delayed, or sped up, according tothe current delay

For example, resampler 223 may resample a digital audio signal receivedfrom the audio processor 222 and generated based on the multi-channeldigital audio signal received from the transmitter 110 before thedigital audio signal is used to generate sound. The resampling may slowdown the playback of the digital audio signal, for example, duplicatingsamples, in order to increase the amount of time over which a section ofthe digital audio signal is used to generate sound. This may cause anincrease in playback time for a section of the digital audio signal, forexample, causing a section that originally represented 100 ms of audioto take 102 ms to play back, implementing a delay of 2 ms. The resampler223 may implement resampling to delay the playback of the digital audiosignal in any suitable manner, and over any suitable period of time. Theresampler 223 may avoid implementing too great of a delay over too shortof a period of time, as the resampling may cause audio artifacts ornoticeable changes in pitch in the generated sound. The resampler 223may also speed up the playback of the digital audio signal, for example,dropping samples, and may also avoid implementing too great of a speedup over too short of a time period.

The current delay may be the total delay needed at the time the currentdelay is determined, and previous delays implemented by the resampler223 may count toward this total. For example, the first current delay,as adjusted by the controller 221, may be 10 ms. The resampler 223 mayhave delayed playback of the digital audio signal output by the audioprocessor 222 by 5 ms when the controller 221 adjust the current delayto 9 ms. Because playback of the digital audio signal is already delayedby 5 ms, the resampler 232 may only delay playback of the digital audiosignal by an additional 4 ms, bringing the total delay to 9 ms, matchingthe current delay. If the current delay, as adjusted by the controller221, is not 9 ms, for example, is 7 ms or 10 ms, the resampler 223 maydelay or speed up playback of the digital audio signal accordingly, forexample, implementing a 2 ms speed up to bring the total delay to 7 ms,or implementing an additional 1 ms delay to bring the delay to 10 ms.The current delay may change, for example, due to a person moving andchanging the distance between the microphone 210 of the receiver 120 andthe nearest speaker, for example, the speaker 151, or due to changes inthe transmission environment for the transmitter 110. When the totaldelay matches the most recent current delay, the resampler 223 may allowthe digital audio signal to play back without delaying or speeding upthe playback of the digital audio signal.

The resampler 223 may output a resampled digital audio signal to anysuitable components of the receiver 120 for play back of the digitalaudio signal through a sound generating device, such as speakers orheadphones. For example, the resampler 223 may output the resampleddigital audio signal to a DAC, which may convert the digital audiosignal to an analog audio signal. The analog audio signal may then beamplified, for example, by an operational amplifier of the receiver 120,before being output to a sound generated device. When the total delayimplemented by the resampler 223 matches the current delay indicated bythe controller 221, the sound generated by the sound generating devicemay be synchronized with sound from, for example, the sound wave 161generated by the speaker 151. A person using the sound generating deviceof the receiver 120 may hear sounds generated by the audio sources 101through the speaker 161 and through the sound generating device of thereceiver 120 at approximately the same time.

FIG. 3A shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter. At the transmitter 110, an audio signal 301 may besampled and filtered, for example, with the sampler 201 and theanti-aliasing filter 202. The audio signal 301 may be, for example, ananalog representation of sound over any suitable time period in timedomain 310. For example, the audio signal 301 may represent sound fromthe audio sources 101 over 10 ms. The audio signal 301 may becontinuously sampled at any suitable sample rate and may then befiltered to prevent anti-aliasing, generating a filtered digital audiosignal 302. The filtered digital audio signal 302 may be a digitalrepresentation of the audio signal 301, and the sound represented by theaudio signal 301, in time domain 320. The filtered digital audio signal302 may include any suitable number of samples at any suitable samplingrate. For example, the filtered digital audio signal 302 may include 40samples to represent the 10 ms of the audio signal 301, at a samplingrate of 4 kHz.

The filtered digital audio signal 302 may be continuously down sampled,for example, by the down sampler 203. The filtered digital audio signal302 may be down-sampled by any suitable factor, in any suitable manner.For example, the filtered digital audio signal 302 may be down sampledby a factor of 5, generating a down sampled digital audio signal 303.The down sampled digital audio signal 303 may be a digitalrepresentation of the audio signal 301, and the sound represented by theaudio signal 301, in the time domain 320. The down sampled digital audiosignal 303 may include any suitable number of samples at any suitablesampling rate. For example, the down sampled digital audio signal 303may include 8 samples to represent the 10 ms of the audio signal 301, ata sampling rate of 800 Hz.

The down sampled digital audio signal 303 may be windowed at anysuitable interval using any suitable window, with any suitableparameters. For example, the down sampled digital audio signal 303 maybe windowed using the Tukey window 206, generating a windowed digitalaudio signal 304. The windowed digital audio signal 304 may be a digitalrepresentation of the audio signal 301, and the sound represented by theaudio signal 301, in the time domain 320.

The windowed digital audio signal 304 may be flipped, for example, bythe data flipper 207, reversing the order of the samples of the windoweddigital audio signal 304. For example, the windowed digital audio signal304 may be flipped in the memory of the transmitter 110 through memorymapping operations, generating a reversed digital audio signal 305. Thereversed digital audio signal 305 may be a digital representation of thereverse of the audio signal 301, and the sound represented by thereverse of the audio signal 301, in the time domain 320.

The reversed digital audio signal 305 may be transformed into thefrequency domain, for example, using the fast Fourier transform 208. Forexample, the reversed digital audio signal 305 may be processed usingany suitable Fourier transform function, using any suitable parameters,generating a DFT representation 306 of the reversed digital audio signal305. The DFT representation 306 may be a digital representation of thereverse of the audio signal 301, and the sound represented by thereverse of the audio signal 301, in the frequency domain 330.

The DFT representation 306 may be normalized to any suitable normal, forexample, using the fast Fourier transform 208. For example, the DFTrepresentation 306 may be normalized so that the maximum value of anyreal or imaginary component among all of the complex numbers in the DFTrepresentation 306 is 1 by dividing the value of each real and imaginarynumber of the DFT representation 306 by the absolute value of the realor imaginary number with the largest magnitude in the DFT representation306, generating a normalized DFT representation 307. The normalized DFTrepresentation 307 may be a digital representation of the reverse of theaudio signal 301, and the sound represented by the reverse of the audiosignal 301, in the frequency domain 330.

FIG. 3B shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter. At the receiver 120, an audio signal 351 may be sampledand filtered, for example, with the sampler 211 and the anti-aliasingfilter 212. The audio signal 351 may be, for example, an analogrepresentation of sound over any suitable time period in time domain310. For example, the audio signal 351 may represent sound from theaudio sources 101 over 10 ms, as recorded by the microphone 210 of thereceiver 120 based on sound waves 161, 162, 163, and 164 as receivedfrom the speakers 151, 152, 153, and 154. The audio signal 351 and theaudio signal 301 may both be based on sound generated by the audiosources 101 over the same time period. The audio signal 351 may becontinuously sampled at any suitable sample rate and may then befiltered to prevent anti-aliasing, generating a filtered digital audiosignal 352. The filtered digital audio signal 352 may be a digitalrepresentation of the audio signal 351, and the sound represented by theaudio signal 351, in time domain 320. The filtered digital audio signal352 may include any suitable number of samples at any suitable samplingrate. For example, the filtered digital audio signal 302 may include 40samples to represent the 10 ms of the audio signal 351, at a samplingrate of 4 kHz.

The filtered digital audio signal 352 may be continuously down sampled,for example, by the down sampler 213. The filtered digital audio signal352 may be down-sampled by any suitable factor, in any suitable manner.For example, the filtered digital audio signal 352 may be down sampledby a factor of 5, generating a down sampled digital audio signal 353.The down sampled digital audio signal 353 may be a digitalrepresentation of the audio signal 351, and the sound represented by theaudio signal 351, in the time domain 320. The down sampled digital audiosignal 353 may include any suitable number of samples at any suitablesampling rate. For example, the down sampled digital audio signal 353may include 8 samples to represent the 10 ms of the audio signal 351, ata sampling rate of 800 Hz.

The down sampled digital audio signal 353 may be windowed at anysuitable interval using any suitable window, with any suitableparameters. For example, the down sampled digital audio signal 353 maybe windowed using the Tukey window 215, generating a windowed digitalaudio signal 354. The windowed digital audio signal 354 may be a digitalrepresentation of the audio signal 351, and the sound represented by theaudio signal 351, in the time domain 320.

The windowed digital audio signal 354 may be transformed into thefrequency domain, for example, using the fast Fourier transform 216. Forexample, the windowed digital audio signal 354 may be processed usingany suitable Fourier transform function, using any suitable parameters,generating a DFT representation 356 of the windowed digital audio signal354. The DFT representation 356 may be a digital representation of theof the audio signal 351, and the sound represented by the audio signal351, in the frequency domain 330.

The DFT representation 356 may be normalized to any suitable normal, forexample, using the fast Fourier transform 216. For example, the DFTrepresentation 356 may be normalized so that the maximum value of anyreal or imaginary component among all of the complex numbers in the DFTrepresentation 356 is 1 by dividing the value of each real and imaginarynumber of the DFT representation 356 by the absolute value of the realor imaginary number with the largest magnitude in the DFT representation356, generating a normalized DFT representation 357. The normalized DFTrepresentation 357 may be a digital representation of the audio signal351, and the sound represented by the audio signal 351, in the frequencydomain 330.

FIG. 3C shows an example arrangement suitable for audio synchronizationand delay estimation according to an implementation of the disclosedsubject matter. The DFT representation 307 may be transmitted to thereceiver 120, for example, as part of the audio metadata transmitted bythe transmitter 110 to the receiver 120. The DFT representation 307 maybe multiplied with the DFT representation 357, for example, by themultiplier 217. For example, the DFT representation 307 and the DFTrepresentation 357 may be multiplied element-wise, generating acorrelation result in the DFT representation 361. The correlation resultin the DFT representation 361 may be a digital representation of thecross-synthesis of the sound represented by the reverse of the audiosignal 301 with the sound represented by the audio signal 351 in thefrequency domain 330.

The correlation result in the DFT representation 361 may be phasetransformed, for example, by the PHAT weighting 218. For example, thecorrelation result in the DFT representation 361 may be phasetransformed by dividing each complex number of the correlation result inthe DFT representation 361 by its own absolute value, generating a PHATweighted DFT representation 362. The PHAT weighted DFT representation362 may be a digital representation of the cross-synthesis of the soundrepresented by the reverse of the audio signal 301 with the soundrepresented by the audio signal 351, with amplitude information removed,in the frequency domain 330.

The PHAT weighted DFT representation 362 may be transformed into thetime domain, for example, by the inverse fast Fourier transform 219. Forexample, the PHAT weighted DFT representation 362 may be processed usingany suitable inverse Fourier transform function, using any suitableparameters, generating correlated digital signal 363. The correlateddigital signal 363 may be a digital representation of thecross-synthesis of the sound represented by the reverse of the audiosignal 301 with the sound represented by the audio signal 351, withamplitude information removed, in the time domain 320. The correlateddigital signal 363 may include any suitable number of samples. Forexample, the correlated digital signal 363 may include 8 samples.

Any suitable samples from the correlated digital signal 363 may be usedin any suitable manner to determine a relative delay between the audiosignal 301 and the audio signal 351, for example, by the delay search220. The position of the sample with the greatest amplitude may besubtracted from the number of remaining samples and multiplied by theamount of time represented by each sample to determine the relativedelay. The sample with the greatest amplitude of the 8 remaining samplesmay be the 7^(th) sample. The location of the sample with the greatestamplitude in the correlated digital signal 363 may indicate that theaudio signal 351 is ahead of the audio signal 301. The relative delaybetween the audio signal 301 and the audio signal 351 may be used inconjunction with a current delay to determine how much to delay theplayback of an audio signal related to the audio signal 301 on thereceiver 120 in order to synchronize with the sound that used togenerate audio signal 351, for example, by the controller 221. Theaccuracy of the determined relative delay may be limited by the numberof samples in the correlated digital signal. Graph 364 may be a visualrepresentation of the correlated digital signal 363.

FIG. 4 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter. At 400, an audio signal may be pre-processed. For example,source audio signals generated by the audio sources 101 may bepre-processed at the computing device of the transmitter 110. Thepre-processing may include, for example, sampling of the audio signal,which may be analog, by the sampler 201, filtering of the resultingdigital audio signal by the anti-aliasing filter 202, and down samplingof the resulting filtered digital audio signal by the down sampler 203.An audio signal that is continuously input to the computing device ofthe transmitter 110 may be continuously pre-processed, and the resultingsignal from pre-processing may be stored, for example, in the inputarray 204 which may be a first-in first-out data structure. Any suitablenumber of samples, representing any suitable length of the audio signalinput to the computing device of the transmitter 110, may be stored.

At 402, a root mean square of the audio signal may be determined. Forexample, the RMS determiner 205 may determine the root mean square of asection of the audio signal, for example, as stored in the input array204. The RMS determiner 205 may determine the root mean square atspecified intervals, for example, every 500 ms. For example, the inputarray 204 may store samples representing a 682.7 ms section of the audiosignal, and the RMS determiner 205 may determine a first root meansquare after 682.7 ms, and then may determine additional root meansquares every 500 ms thereafter, using samples representing the previous682.7 ms of the audio signal. The root mean square may be stored, forexample, in the transmission buffer 209 as part of the audio metadatafor the section of the audio signal for which it was determined.

At 404, the audio signal may be reversed. For example, the data flipper207 may reverse the ordering of the samples in the digital audio signalgenerated by pre-processing the audio signal. The data flipper 207 may,for example, change the memory mapping of the digital audio signal. Theaudio signal may be reversed at intervals, for example at the sameintervals, and at the same time, that the root mean square isdetermined. The section of the audio signal that is reversed may be thesame section of the audio signal for which the root means square wasdetermined. Before being reversed, the audio signal may be windowedusing any suitable window, such as, for example, the Tukey Window 206.

At 406, a discrete Fourier transform of the audio signal may begenerated. For example, the reversed digital audio signal generated bythe data flipper 207 may be transformed to the frequency domain usingthe fast Fourier transform 208, which may use any suitable parameters togenerate a discrete Fourier transform representation. The discreteFourier transform representation may be stored, for example, in thetransmission buffer 209 along with the root mean square as part of theaudio metadata for the same section of the audio signal for which theroot mean square was determined. The discrete Fourier transformrepresentation may also be normalized, for example, dividing eachcomponent of each complex number of the discrete Fourier transformrepresentation by the absolute value of the component with the greatestmagnitude among all the complex numbers of the discrete Fouriertransform. The discrete Fourier transform may be generated at intervals,for example, the same intervals at which the root mean square isdetermined, and after the audio signal is reversed. Completion ofreversal of the audio signal may trigger the generation of the discreteFourier transform.

At 408, the discrete Fourier transform representation and root meansquare may be transmitted. For example, the discrete Fourier transformrepresentation and root mean square stored in the transmission buffer209 may be transmitted wirelessly as audio metadata by the transmissiondevice of the transmitter 110, for example, using the wireless signal171. The transmission may be a wireless broadcast using any suitablewireless protocol, and the audio metadata may be encoded fortransmission in any suitable format. The wireless signal 171, carryingthe audio metadata, along with the section of the audio signal for whichthe audio metadata was determined, may be received at various receivers,such as the receivers 120, 121, 122, and 123, which may be, for examplepersonal audio devices used by persons in a venue. The transmission ofthe discrete Fourier transform representation and the root mean squaremay occur at intervals, for example, the same intervals at which theroot mean square is determined, and after both root mean square anddiscrete Fourier transform representation are stored in the transmissionbuffer 209. The audio signal may be streamed continuously from thetransmitter 110.

FIG. 5 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter. At 500, an audio signal may be generated from environmentalsound. For example, the microphone 210 of the receiver 120 may generatean audio signal based on sound at the location of the microphone 210.The sound may include, for example, sound waves 151, 152, 153, and 154from the speakers 151, 152, 153, and 154, which may be based on an audiosignal originating with the audio sources 101 and processed through theaudio processing 105. The audio signal may be generated continuously assound arrives at the microphone 210.

At 502, a discrete Fourier transform and root mean square may bereceived. For example, audio metadata including a discrete Fouriertransform representation and root mean square for a section of the audiosignal originating with the audio sources 101 may be received at thereceiver 120 through wireless signal 171 from the transmission device ofthe transmitter 110.

At 504, the audio signal may be pre-processed. For example, the audiosignal generated based on environmental sound, for example, by themicrophone 210, may be pre-processed by the receiver 120. Thepre-processing may include, for example, sampling of the audio signal,which may be analog, by the sampler 211, filtering of the resultingdigital audio signal by the anti-aliasing filter 212, and down samplingof the resulting filtered digital audio signal by the down sampler 213.An audio signal that is continuously input to the computing device ofthe transmitter 110 may be continuously pre-processed, and the resultingsignal from pre-processing may be stored, for example, in the inputarray 204 which may be a first-in first-out data structure. Any suitablenumber of samples, representing any suitable length of the audio signalinput to the computing device of the transmitter 110, may be stored. Thesignal resulting from pre-processing may also be windowed, for example,by the Tukey window 215. Windowing with the Tukey window 215 may occurat specified intervals, such as, for example, after the entirety of asection of audio signals based on a section of the source audio signalsis received at the receiver 120 along with its corresponding audiometadata.

At 506, a discrete Fourier transform of the audio signal may begenerated. For example, the pre-processed digital audio signal, whichmay be a windowed digital audio signal, may be transformed to thefrequency domain using the fast Fourier transform 216, which may use anysuitable parameters to generate a discrete Fourier transformrepresentation. The discrete Fourier transform representation may benormalized, for example, dividing each component of each complex numberof the discrete Fourier transform representation by the absolute valueof the component with the greatest magnitude among all the complexnumbers of the discrete Fourier transform.

At 508, the generated discrete Fourier transform may be multiplied withthe received discrete Fourier transform to generate a correlation resultin a discrete Fourier transform representation. For example, themultiplier 217 may multiply the discrete Fourier transformrepresentation generated using the fast Fourier transform 216 on thereceiver 120 with the discrete Fourier transform representation receivedas part of the audio metadata from the transmitter 110. The result ofthe multiplication may be a correlation result in a discrete Fouriertransform representation. The correlation result in a discrete Fouriertransform representation may also be weighted. For example, PHATweighting may be applied to the correlation result in a discrete Fouriertransform representation, dividing each complex number by its ownabsolute value, removing amplitude information while preserving phaseinformation.

At 510, a correlated signal may be generated from the correlation resultin a discrete Fourier transform representation. For example, acorrelated audio signal may be generated from the correlation result ina discrete Fourier transform representation, which may be PHAT weighted,using the inverse fast Fourier transform 219. The correlated signal maybe a digital signal and may include any suitable number of samples, forexample, depending on the number of samples used to generate thediscrete Fourier transform representations on the receiver 120 and thetransmitter 110.

At 512, a relative delay may be determined from the correlated signal.For example, the delay search 220 may determine the sample of thecorrelated signal with the greatest magnitude. The position of thesample with the greatest magnitude may indicate the magnitude and signof the relative delay. For example, if the sample with greatestmagnitude is in the first half of the correlated signal, the relativemay be positive, otherwise, if the sample is in the second half of thecorrelated signal, the relative delay may be negative. If the samplewith the greatest magnitude is the first or last sample of thecorrelated signal, the relative delay may be zero. The accuracy of thedetermined relative delay value may be limited by, for example, thegranularity of the samples. For example, if each sample represents 4 msof the audio signal, a relative delay of 2 ms may not be determined.

FIG. 6 shows an example procedure suitable for audio synchronization anddelay estimation according to an implementation of the disclosed subjectmatter. At 600, a relative delay may be added to a histogram. Forexample, the controller 221 may receive a relative delay value output bythe delay search 220, and may add the relative delay value to a set ofvalues used in the running histogram. The running histogram may be basedon any suitable number of relative delay values, and may include anysuitable number of bins with any suitable granularity. For example, thegranularity of the histogram bins may match the granularity of thesamples of the correlated audio signal. The values in the histogram maybe weighted in any suitable manner. For example, a relative delay valuemay be weighted according to the root mean square that was included inthe audio metadata along with the discrete Fourier transformrepresentation that was used to determine the relative delay value. Arelative delay value may also be weighted according to recency.

At 602, an adjustment to a current delay may be determined from thehistogram. For example, the controller 221 may determine an adjustmentto the current delay by determining the bin of the histogram with thehighest count. The delay represented by the bin of the histogram withthe highest count may be added to the current delay, which may initiallybe zero before any relative delays have been determined, to adjust thecurrent delay. The current delay may also be initially set based on aknown distance between, for example, the receiver 120 and a speaker,such as the speaker 151.

At 604, playback of an audio signal may be adjusted based on the currentdelay. For example, the receiver 120 may receive an audio signal fromthe transmitter 110, which may be based on audio signals originatingwith the audio sources 101. The audio signal may be, for example, amulti-channel digital audio signal. The audio signal may be buffered orotherwise stored on the receiver 120 for playback through a soundgenerating device, such as headphones, connected to the receiver 120.The audio signal may be delayed or sped up based on the current delay asadjusted by the controller 221, and on a total amount of time by whichthe audio signal has already been delayed. For example, if the currentdelay is 6 ms, and the audio signal has not yet been delayed at all, theresampler 223 may attempt to delay the audio signal by 6 ms before thenext current delay is received from the controller 221, slowing downplayback of the audio signal through, for example, duplicated samples.If after the next adjustment by the controller 221, the current delay is8 ms, and the resampler 223 successfully delayed playback of the audiosignal by 6 ms after receiving the previous current delay of 6 ms, theresampler 223 may delay playback of the audio signal by an additional 2ms. The resampler 223 may also speedup playback of the audio signal, forexample, dropping samples. For example, if after delaying playback ofthe audio signal by 8 ms, the current delay is adjusted to 5 ms, theresampler 223 may speed up playback of the audio signal by 3 ms. Theresampler 223 may avoid delaying or speeding up the audio signal toomuch over too short a time period to avoid introducing audio artifactsor pitch changes. For example, if the current delay is 15 ms, andplayback of the audio signal has not yet been delayed at all, theresampler 223 may only be able to delay playback of the audio signal by8 ms before the next adjustment of the current delay is made by thecontroller 221. This may allow the resampler 223 to graduallysynchronize playback on the receiver 120 of an audio signal based onaudio signals from the audio sources 101 with the arrival of soundgenerated by speakers, such as the speakers 151, 152, 153, and 154,based on the same audio signals from the audio sources 101, at thelocation of the receiver 120, even as a person with the receiver 120moves. This may prevent a person from hearing audio generated by thereceiver 120, for example, through headphones, being echoed by the soundgenerated by the speakers 151, 152, 153, and 154.

Embodiments of the presently disclosed subject matter may be implementedin and used with a variety of component and network architectures. FIG.7 is an example computer system 20 suitable for implementing embodimentsof the presently disclosed subject matter. The computer 20 includes abus 21 which interconnects major components of the computer 20, such asone or more processors 24, memory 27 such as RAM, ROM, flash RAM, or thelike, an input/output controller 28, and fixed storage 23 such as a harddrive, flash storage, SAN device, or the like. It will be understoodthat other components may or may not be included, such as a user displaysuch as a display screen via a display adapter, user input interfacessuch as controllers and associated user input devices such as akeyboard, mouse, touchscreen, or the like, and other components known inthe art to use in or in conjunction with general-purpose computingsystems.

The bus 21 allows data communication between the central processor 24and the memory 27. The RAM is generally the main memory into which theoperating system and application programs are loaded. The ROM or flashmemory can contain, among other code, the Basic Input-Output system(BIOS) which controls basic hardware operation such as the interactionwith peripheral components. Applications resident with the computer 20are generally stored on and accessed via a computer readable medium,such as the fixed storage 23 and/or the memory 27, an optical drive,external storage mechanism, or the like.

Each component shown may be integral with the computer 20 or may beseparate and accessed through other interfaces. Other interfaces, suchas a network interface 29, may provide a connection to remote systemsand devices via a telephone link, wired or wireless local- or wide-areanetwork connection, proprietary network connections, or the like. Forexample, the network interface 29 may allow the computer to communicatewith other computers via one or more local, wide-area, or othernetworks, as shown in FIG. 8.

Many other devices or components (not shown) may be connected in asimilar manner, such as document scanners, digital cameras, auxiliary,supplemental, or backup systems, or the like. Conversely, all of thecomponents shown in FIG. 7 need not be present to practice the presentdisclosure. The components can be interconnected in different ways fromthat shown. The operation of a computer such as that shown in FIG. 7 isreadily known in the art and is not discussed in detail in thisapplication. Code to implement the present disclosure can be stored incomputer-readable storage media such as one or more of the memory 27,fixed storage 23, remote storage locations, or any other storagemechanism known in the art.

FIG. 8 shows an example arrangement according to an embodiment of thedisclosed subject matter. One or more clients 10, 11, such as localcomputers, smart phones, tablet computing devices, remote services, andthe like may connect to other devices via one or more networks 7. Thenetwork may be a local network, wide-area network, the Internet, or anyother suitable communication network or networks, and may be implementedon any suitable platform including wired and/or wireless networks. Theclients 10, 11 may communicate with one or more computer systems, suchas processing units 14, databases 15, and user interface systems 13. Insome cases, clients 10, 11 may communicate with a user interface system13, which may provide access to one or more other systems such as adatabase 15, a processing unit 14, or the like. For example, the userinterface 13 may be a user-accessible web page that provides data fromone or more other computer systems. The user interface 13 may providedifferent interfaces to different clients, such as where ahuman-readable web page is provided to web browser clients 10, and acomputer-readable API or other interface is provided to remote serviceclients 11. The user interface 13, database 15, and processing units 14may be part of an integral system, or may include multiple computersystems communicating via a private network, the Internet, or any othersuitable network. Processing units 14 may be, for example, part of adistributed system such as a cloud-based computing system, searchengine, content delivery system, or the like, which may also include orcommunicate with a database 15 and/or user interface 13. In somearrangements, an analysis system 5 may provide back-end processing, suchas where stored or acquired data is pre-processed by the analysis system5 before delivery to the processing unit 14, database 15, and/or userinterface 13. For example, a machine learning system 5 may providevarious prediction models, data analysis, or the like to one or moreother systems 13, 14, 15.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive or tolimit embodiments of the disclosed subject matter to the precise formsdisclosed. Many modifications and variations are possible in view of theabove teachings. The embodiments were chosen and described in order toexplain the principles of embodiments of the disclosed subject matterand their practical applications, to thereby enable others skilled inthe art to utilize those embodiments as well as various embodiments withvarious modifications as may be suited to the particular usecontemplated.

1. A method comprising: pre-processing an audio signal at a transmitterto generate a transmitter pre-processed audio signal comprising samples,each sample comprising a value and having a position in the transmitterpre-processed audio signal; reversing, by the transmitter, the positionsof the samples of the transmitter pre-processed audio signal to generatea reversed audio signal; generating, by the transmitter, a transmitterdiscrete Fourier transform representation from the reversed audiosignal; transmitting, by the transmitter, at least a section of thetransmitter discrete Fourier transform representation to a receiver asaudio metadata; receiving, by the receiver, the audio metadatacomprising the at least a section of the transmitter discrete Fouriertransform representation; pre-processing, by the receiver, a secondaudio signal; generating, by the receiver, a receiver discrete Fouriertransform representation from the pre-processed second audio signal;generating, by the receiver, a correlation result in a discrete Fouriertransform representation based on an element-wise multiplication of theat least a section of the transmitter discrete Fourier transformrepresentation and the receiver discrete Fourier transformrepresentation; performing, by the receiver, an inverse Fouriertransform on the correlation result in a discrete Fourier transformrepresentation to generate a correlated signal comprising one or moresamples, each sample of the correlated signal having a position in thecorrelated signal and comprising a value; determining, by the receiver,a relative delay value based on the position in the correlated signal ofa sample comprising a value with the greatest magnitude of the values ofthe samples of the correlated signal; and adjusting, by the receiver,playback of a third audio signal based on a current delay value adjustedbased on the relative delay value.